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Preface 


Organization 





This publication describes checkpoint/restart for MVS/Extended Architecture 
(MVS/XA). The checkpoint/restart program allows programmers and system 
analysts to: 

¢ Record information about a job at designated checkpoints. 


¢ Restart a job at the beginning of a step or at a checkpoint within a step. 


Checkpoint/Restart runs in 31-bit addressing mode when necessary and can call 
other components that run in 24-bit addressing mode. 


This publication is organized as follows: 


e Chapter 1, “Introduction” on page 1, describes in general terms 


checkpoint/restart, its components, its dependencies, and information on data 


set security. 


e Chapter 2, “Coding a Checkpoint” on page 7, describes how to set up 
checkpoints for job steps. 


e Chapter 3, “Requesting Restart” on page 31, describes how to request 
various types of restart. 


¢ Chapter 4, ‘“Checkpoint/Restart Processing” on page 51, describes how to 
use checkpoint/restart with user data sets, tapes, and other programs. 


e Appendix A, ‘“‘Checkpoint/Restart Codes’ on page 71, lists completion 
codes, return codes, and reason codes issued by checkpoint/restart. 


e Appendix B, ‘End-of-Volume Checkpoint Routine” on page 79, describes 
how to establish a checkpoint at end-of-volume. 


e “List of Abbreviations” on page 81 defines the acronyms used in this book. 


e Index is a subject index to this publication. 
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Prerequisite Knowledge 


In order to use this book efficiently, you should be familiar with the following 
topics: 


e Job control language 


e Data management 


Required Publications 


e MVS/Extended Architecture JCL, GC28-1148 
e MVS/Extended Architecture Data Administration Guide, GC26-4140 


e MVS/Extended Architecture Data Administration: Macro Instruction Reference, 
GC26-4141 


Related Publications 


Within the text, references are made to the publications listed in the table below: 


















Checkpoint/Restart 
SVC Logic 


MVS/Extended Architecture 
Checkpoint/ Restart Supervisor 
Call Logic 












Data Administration: 
Macro Instruction 
Reference 


MVS/Extended Architecture 
Data Administration: Macro 
Instruction Reference 


MVS/Extended Architecture 
Data Facility Product Version 
2: Customization 


IBM 3800 Printing Subsystem 
Programmer’s Guide 


GC26-4141 









Data Facility Product: 
Customization 


GC26-4267 
















IBM 3800 Printing 
Subsystem Programmer’s 
Guide 


JCL User’s Guide MVS/Extended Architecture GC28-1351 
JCL User’s Guide 

JCL Reference MVS/Extended Architecture GC28-1352 
JCL Reference 


GC26-3846 
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Order 
Short Title Publication Title Number 


JES2 Initialization and MVS/Extended Architecture SC23-0065 
Tuning System Programming Library: 
JES2 Initialization and Tuning 


JES3 Initialization and MVS/Extended Architecture S$C23-0059 
Tuning System Programming Library: 
JES3 Initialization and Tuning 


JES3 Introduction MVS/Extended Architecture GC23-0049 
JES3 Introduction 


JES3 Messages MVS/Extended Architecture GC23-0062 
Message Library: JES3 
Messages 


Magnetic Tape Labels MVS/Extended Architecture GC26-4145 


and File Structure Magnetic Tape Labels and File 
Administration Structure Administration 


Supervisor Services and MVS/Extended Architecture GC28-1154 
Macro Instructions System Programming Library: 

Supervisor Services and Macro 

Instructions 


System Generation MVS/Extended Architecture GC26-4148 
Installation: System Generation 


System Messages MVS/Extended Architecture GC28-1376 
Message Library: System and 
Messages, Volumes 1 and 2 GC28-1377 


VSAM Administration: MVS/Extended Architecture GC26-4152 
Macro Instruction VSAM Administration: Macro 
Reference Instruction Reference 
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Summary of Changes 


| Release 3.0, June 1987 


| Restructure 


| Most of the text from ' Appendix B. End-of-Volume Checkpoint Routine" has 
| been removed and placed in Data Facility Product: Customization. 


Release 1.0, April 1985 


Enhancements 
e Support for the IBM 3480 Magnetic Tape Subsystem has been added. 
¢« Information to support the IBM 4245, 4248, and 3262 Model 5 Printers has 


been added under ‘“‘Use of CHKPT with Other Macro Instructions”’ on 
page 24. 


Version 2 Publications 


The Preface includes new order numbers for Version 2. 


Summary of Changes 
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Chapter 1. Introduction 


Types of Restart 


Checkpoint/Restart is a method of recording information about a job at 
programmer-designated checkpoints so the job can be restarted at one of these 
checkpoints or at the beginning of a job step. 

A checkpoint is taken when your program issues the CHKPT macro instruction. 
This macro causes the contents of the program’s virtual storage area and certain 
system control information to be written as a series of records in a data set. These 
records can be retrieved from the data set if the job terminates abnormally or 
produces erroneous output, and the job can be restarted. Restart can take place 
immediately (initiated by the operator at the console) or be deferred until the job is 


resubmitted. In either case, you can avoid the time-consuming process of 
rerunning the entire job from the beginning. 


The system allows two types of restart: 

e Automatic step restart 

e Deferred step restart 

Checkpoint/Restart allows two types of restart: 
e Automatic checkpoint restart 

e Deferred checkpoint restart 

Automatic restarts occur in different ways: 


e Automatic step restart is restarted at the beginning of a job step that has 
abended. 


e Automatic checkpoint restart is restarted at the last checkpoint taken before 
the job failed. 


Deferred restart occurs when a job is resubmitted with the RESTART parameter 
specified on the JOB statement. 
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¢« Deferred step restart occurs at the beginning of the job step specified in the 
RESTART parameter of the JOB statement. fo 


e Deferred checkpoint restart occurs at the checkpoint specified in the 
RESTART parameter of the JOB statement. 


Components of Checkpoint /Restart 


CHKPT Macro Instruction 


The CHKPT macro is coded in your program to cause a checkpoint to be taken. 


When a CHKPT macro is executed, the contents of the program’s virtual storage 
data area and certain system control information are written in a data set as a series 
of records. The series of records is called a checkpoint entry, and the data set in 
which they are written is called a checkpoint data set. The checkpoint entry, which 
has a unique programmer-specified or system-generated identification called a 
checkid, is retrieved from the data set when restart occurs. 


For a detailed explanation of how to establish a checkpoint, see 
Chapter 2, “Coding a Checkpoint” on page 7. 


End-of-Volume Exit Routine o—~ 


The end-of-volume exit routine is coded in your program to allow execution of the sd 
CHKPT macro instruction each time the processing of a multivolume physical 

sequential user data set is continued on another volume. 

Appendix B, “End-of-Volume Checkpoint Routine” on page 79 contains a brief 

description of this. For more detailed information about coding the EOV exit for 

physical sequential data sets, see DFP Customization. 


Checkpoint at End-of-Volume Facility 


A system-supplied routine takes checkpoints at end-of-volume occurrences for 
multivolume QSAM and BSAM data sets and can be invoked with a JCL 
parameter. “Checkpoint at End-of-Volume” on page 8 contains a brief 
description of this. For more detailed information, see Data Facility Product: 
Customization. 


RD (Restart Definition) Parameter 


When the RD parameter is used, it is coded in the JOB or EXEC statements and is 
used to request automatic step restart if job failure occurs and/or to suppress, 
partially or totally, the action of the CHKPT macro instruction. For more detailed 
information about this parameter, see Chapter 3, ‘Requesting Restart” on 

page 31. 
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RESTART Parameter 


The RESTART parameter, coded in the JOB statement, is required when a job is 
resubmitted for deferred restart. It specifies the step (for deferred step restart) or 
the step and the checkpoint within that step (for deferred checkpoint restart) where 
restart should begin. For more detailed information about this parameter, see 
Chapter 3, ‘Requesting Restart” on page 31. 


SYSCHK DD Statement 


The SYSCHK DD statement is required when a job is resubmitted for deferred 
checkpoint restart. It defines the checkpoint data set for the job being restarted. 
For more detailed information about this statement, see “SYSCHK DD Statement” 
on page 49. 


SYSCKEOV DD Statement 


The SYSCKEOV DD statement defines the checkpoint data set that is to contain 
the checkpoint records generated when a checkpoint is taken at end-of volume. 
For more information about this statement, see “SYSCKEOV DD Statement” on 
page 10. 


Checkpoint /Restart Dependencies 


CKPTREST System Generation Specification 


The CKPTREST macro instruction, coded at system generation, specifies which 
system completion codes indicate that a step is eligible for restart. During system 
generation, a standard IBM-defined set of system completion codes (codes 
produced when the system executes abend) is placed in a table of eligible codes. 
The table becomes part of the control program. CKPTREST, which is optional, 
can be used to delete system completion codes from the table and to add user 
completion codes (codes produced when your program executes abend) to the 
table. 


For more information on the CKPTREST macro, see System Generation Reference. 


Job Journal Requirements 


The job journal is a sequential data set that resides on the spool volume of the job 
entry subsystem. It contains a set of selected job-related control blocks that are 
critical to automatic restart processing. 


The job journal is necessary because the scheduler control blocks are maintained in 
the scheduler work area (SWA) in pageable storage, rather than in a job queue on 
external storage. When a job or the system itself fails, the space that contains the 
SWA and its job control blocks is lost. Because it preserves up-to-date copies of 
certain critical control blocks, the job journal makes it possible to reconstruct the 
SWA. SWA control blocks are restored to the state they were in just prior to the 
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failing step for automatic step restart. For automatic checkpoint restart, SWA 
control blocks are reconstructed as they appeared in the most recently issued 
CHKPT. 


For JES2 or JES3, job journaling is required before a job is eligible for an 
automatic restart. 


For JES2: Job journaling is provided to a job in JES2 in one of the following 
ways: 


e JOURNAL is specified on the JOB CLASS CHARACTERISTICS statement 
in the SYS1.PARMLIB member used for initializing JES2 (see JES2 
Initialization and Tuning). 


e The JCL for the job specifies RD parameter on the JOB statement or the 
EXEC statement (see the publication JCL). 


For JES3: Job journaling is provided to a job in JES3 in one of three ways: 


« JOURNAL=YES is specified on the CLASS initialization statement. For a 
description of the CLASS statement, see JES3 Initialization and Tuning. 


e JOURNAL=YES is specified on the MAIN JCL statement that overrides the 
CLASS initialization statement. For a description of the MAIN statement, see 
the publication JCL. 


e The JCL has specified either the RESTART or the RD parameter on the JOB 
statement or the RD parameter on the EXEC statement. 


The system creates a job journal to hold restart information for any job specifying 
job journaling. After a system failure and the system restart of the failing main 
processor, the jobs in execution that requested job journaling are restarted (warm 
started) by the system. If a job is eligible for automatic restart, the operator is sent 
the message IEF225D asking if the job should be restarted. If the job is not 
eligible for restart or if the operator indicates that the restart should not be 
attempted, any scratch or VIO data sets the job allocated are deleted, and the job 
terminates. Jobs that frequently use scratch and/or VIO data sets should request 
job journaling. 


User Data Set Security 


In order to restart a job step that processes RACF-protected data sets, you must be 
authorized to access all RACF-protected data sets that were open when the 
checkpoint was taken. In order to restart, all password-protected data sets that 
were open at checkpoint time must have the pointer to their password in the ACB 
when the checkpoint is taken, or the password must be known to the operator. For 
RACF-protected data sets, the userid specified on the RESTART job statement 
must be authorized to access the data set. 
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Checkpoint Data Set Security 


An unauthorized user (one who is not authorized by the authorized program 
facility (APF) is not in supervisor state, does not have the system key (keys 0 
through 7)), and cannot communicate directly with the checkpoint data set. The 
unauthorized user can take checkpoints and do restarts, but, after the operation has 
begun, a security interface is used by the system to prevent unauthorized alteration 
of the checkpoint data set. 


If an unauthorized user tries to access a checkpoint data set directly, an error 
message appears, and the job terminates. In addition, the unauthorized user cannot 
take a checkpoint on a new checkpoint data set if another data control block 
(DCB) is already open to that data set. 


Offline security of the checkpoint data set is ensured by the operator. 
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Chapter 2. Coding a Checkpoint 


This chapter defines a checkpoint and explains how you may establish checkpoints 
to restart job steps. 


The CHKPT macro instruction is coded in your program. When the CHKPT 
macro executes, job step information about the user’s program, virtual storage data 
areas, data set position, and supervisor control is written as a checkpoint entry in a 
checkpoint data set. The point at which this information is saved becomes a 
checkpoint from which a restart may be performed. After the checkpoint entry is 
written, control returns to your program at the instruction following the CHKPT 
macro. | 


The topics discussed in this chapter are: 

e When to code a checkpoint 

e Coding the DCB for a checkpoint data set 

( / e Coding the DD Statement for a checkpoint data set 
e Using checkpoint data sets 
e Coding the CHKPT macro instruction 
e Canceling a checkpoint 


e Requesting serially reusable resources 


When to Code a Checkpoint 


Using the CHKPT macro instruction, you can code a checkpoint within your 
program or at end-of-volume. The checkpoint routine records information about 
certain system information, user storage (including user programs in user storage), 
and the data sets used by the step executing the CHKPT macro. Recorded 
information includes: 


e For all data sets, the information that can be coded on a DD statement; for 
example, device type and volume serial numbers. (The contents of the step’s 
JFCBs and JFCB extensions are recorded, as well as the contents of the 
GDGNTs (generation data group name tables).) 
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e For data sets which are open at the checkpoint, which process on either 
magnetic tape or direct access devices, and which use the BSAM, QSAM, 
QISAM, BPAM, VSAM, or EXCP access methods, the information needed to 
reposition the data set if restart occurs at a checkpoint. 


If a checkpoint is taken and an output data set is extended onto a second direct 
access volume (because end-of-volume occurred on the first volume and there was 
no more space available on the volume, or the data set contained 16 extents), 
restart subsequently occurs at that checkpoint and the system does not delete the 
extension of the data set. 


Note that, when a step using the universal character set (UCS) feature is restarted, 
the system does not determine whether the UCS buffer or FCB buffer is properly 
loaded, nor does it alert the operator to the UCS or FCB requirements of the step. 


Checkpoint at End-of-Volume 


The checkpoint at end-of-volume provides an external checkpoint function with no 
user program modifications. It executes immediately prior to the invocation of any 
checkpoint in a user EOV exit routine. Redundancy occurs if a user exit routine is 
supplied and checkpoint at end-of-volume is invoked for the same data set’s 
end-of-volume occurrence. 


Checkpoint at end-of-volume is only executable if the CHKPT=EOV parameter is 
specified for multivolume data sets or for the second, third, etc., data sets of a 
concatenation. If the first data set of a concatenation is a multivolume data set, 
this parameter is also valid on that DD statement. 


Checkpoint at end-of-volume issues a CHKPT macro; if, at EOV, an unsuccessful 
return code is presented, it retries the checkpoint execution one more time in the 
following situation: 


Return code 08—indicates end-of-extent or end-of-volume occurred for the 
SYSCKEOV DD before completion of the checkpoint entry. Checkpoint will 
be retried to use secondary space, if provided. 


For the other unsuccessful return codes (OC, 10, 14, or 18) and for unsuccessful 
retry of return code 08, the following message is generated: 


IEC067I CHKPT=EOV FACILITY EXECUTED UNSUCCESSFULLY 


This message is preceded by a checkpoint/restart error message (prefixed ‘‘THJ’’) 
that describes the nature of the problem. For more detail on these error messages, 


see System Messages. 


In any event, processing continues, and this message is generated for each 
unsuccessful invocation of the checkpoint at end-of-volume until step termination 
occurs. Operator intervention is required to halt further processing prior to step 
end. 
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CHKPT at EOV JCL Parameter 


The CHKPT=EOV parameter on a DD statement requests a checkpoint be taken 
for this job step at EOV for the data set whose DD has this parameter. The 
following restrictions apply to the use of this JCL parameter: 


1. The DD must define a QSAM or BSAM data set. 


2. The QSAM or BSAM data set must be a multivolume data set or the second, 
third, etc., set of a concatenated set of data sets. 


3. The DD statement must not define a SYSOUT, DD *, or DD DATA type data 


set. 


4. The JCL parameters DDNAME and DYNAM cannot be specified on the same 
DD statement with this parameter. 


5. The DD must not define a checkpoint data set. 


The following actions result if any of these restrictions are not observed: 


1,2,5 

3,4 
Examples: 
//DD1 DD 
Tf: 

//DD2 DD 
// DD 
// DD 
//DD3 DD 
fi 

Notes: 


No action, no checkpoints are taken, processing continues as if the 
CHKPT=EOV parameter were not specified. 


JCL error messages appear, and processing is not initiated. 


DSN=DSN1,DISP=OLD, VOL=SER= (TAPEO1,TAPEO2) , 
UNIT=TAPE , CHKPT=EOV 


DSN=DSN2 , DISP=OLD 
DSN=DSNX , DISP=OLD , CHKPT=EOV 
DSN=DSNY , DISP=OLD , CHKPT=EOV 


DSN=DSN3 , DISP=NEW, VOL=(,,,5), 
UNIT=DISK, SPACE= (CYL, (300,300) ) , CHKPT=EOV 


1. DDI1 — Multivolume data set 


2. DD2 — Concatenated data set 


3. DD3 — Multivolume data set on disk 
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SYSCKEOV DD Statement 


The SYSCKEOV DD statement defines the checkpoint data set to contain the a, 
checkpoint records generated from the checkpoint at end-of-volume. The same . 
restrictions that apply to other checkpoint DD statements (see ‘“Coding the DD 

Statement for a Checkpoint Data Set” on page 11) also apply to this DD 

statement, with the following exceptions: 


DISP=MOD is recommended to reduce loss of checkpoint data in the event of 
a system failure while taking a checkpoint. 


This DD must define a sequential BSAM data set (BPAM is not supported). 


All the DCB parameters are provided by the checkpoint at EOV routine and 
should not be coded on the DD statement. 


Example: 


//SYSCKEOV DD DSN=CKPTDS , UNIT=TAPE ,DISP=MOD 


Checkpoint in Exit Routines 


The CHKPT macro instruction must not be used in an exit routine other than the 
end-of-volume exit routine. You may take a checkpoint when a BSAM or QSAM 
data set reaches end-of-volume. For more detailed information about coding the 
EOV exit for physical sequential data sets, see Data Facility Product: Customzation. 


Coding the DCB for a Checkpoint Data Set 


Required DCB Parameters 


You must provide either a DCB for the checkpoint data set or the name of the DD 


e 
atatamant far tha rnhanbnaint data aat (Data AAm otentinn: AAaArYn Inoteuction 
DLGLULLLULAL LUE ULLY UVLIVU DR YVULLIL UCL SOUL. GIG AGiINinisi7T7alion? s44C7oO znsiv7ucric? 


Reference contains detailed To naeon about coding DCBs.) If a DCB is 
provided, the following parameters must be included: 


You must code the DSORG, MACRF, and DDNAME operands in the DCB 
macro. The RECFM, DEVD, and TRTCH operands may be coded in the DCB 
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DSORG=PS or PO (BSAM or BPAM data set organization) 
MACRF=W (WRITE macro instruction) 

RECFM=U or UT (undefined record format) 

DEVD=DA or TA (direct access or tape device) 


TRTCH=C (data conversion with odd parity; parameter required only if the 
data set is on a 7-track magnetic tape) 


DDNAME= (name of DD statement for checkpoint data set) 

















macro instruction, or in the related DD statement. The RECFM and TRTCH may 
be used as subparameters of the DCB parameter. Because RECFM and DEVD 
have default values of U and DA respectively, they need not be provided explicitly 
in either the DCB macro or the DD statement. 


DCB Options 


You may, optionally, provide the following DCB parameters: 


OPTCD=W (write validity checking) 
RECFM=UT (track overflow) 
NCP=2 (number of channel programs) 


NCP=2 and OPTCD=C (chained scheduling) 


These parameters are discussed in detail in Data Administration: Macro Instruction 
Reference. 


Notes on the DCB for a Checkpoint Data Set 


The following restrictions apply to the DCB for a checkpoint data set: 


The checkpoint routine writes all checkpoint records in both 400- and 
4096-byte blocks. 


Requests for two channel programs or chained scheduling apply to the writing 
of all checkpoint records. Such requests do not apply to the reading of 
checkpoint records for a restart. 


OPTCD=Q cannot be specified in the DCB, because checkpoints cannot be 
taken to an ISO/ANSI/FIPS tape. 


Coding the DD Statement for a Checkpoint Data Set 


The DD statement for the checkpoint data set must define the data set in the usual 
way. The publication JCL contains detailed information on coding the DD 
statement. The only restrictions on the DD statement are: 


A checkpoint data set, because it is an output data set, may not be a 
concatenated data set. 


The UNIT parameter must specify a tape or direct access device supported by 
BSAM or BPAM. The device can be specified by referring to a specific device, 
a device type, or a group of devices. If direct access is specified, the device may 
not be shared with another processor. To avoid allocating to a shared DASD, a 
special generic device name should be generated (at system generation time) to 
include a nonshared DASD of a single device type, and this generic name 
should be used. 


DEFER should not be coded in the DD statement. 
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DISP=SHARE may not be specified. 


Secondary space allocation may be requested by the increment subparameter 
(see ““Notes on the DD Statement”’). The increment subparameter must 
specify at least enough space to contain two complete checkpoint entries. For 
information on how to calculate the length of the checkpoint entry, see 
“Storage Estimates for Checkpoint Data Sets” on page 13. 


The LABEL parameter of the DD statement describes the labels of a data set 
on magnetic tape. For a checkpoint data set, you must specify only IBM 
standard labels (SL or SUL). Nonstandard labels (NSL), no labels (NL), or 
International Organization for Standardization/ American National Standards 
Institute (ISO/ANSI) labels (AL or AUL) cannot be specified for a checkpoint 
data set. If the label type is not specified, the operating system assumes that 
the data set has IBM standard labels (SL or SUL). 


OPTCD=Q cannot be specified as a DCB subparameter. 
CHKPT=EOV is ignored (checkpoint resets the value in the JFCB). 


The RLSE subparameter of the SPACE parameter cannot be used on the DD 
statement for a checkpoint data set. 


Notes on the DD Statement 


The initial disposition of the data set (as specified in the DISP operand of the 
DD statement) is used to position the checkpoint data set each time it is 


- opened. A discussion is under “How Checkpoint Entries Are Written” on 


page 16. This describes a specific circumstance in which the only checkpoint 
entry that exists on a data set will be the last checkpoint taken whether that 
entry is valid or not. 


The final and conditional dispositions of the data set have their normal 
meanings. However, if termination is occurring and an automatic restart at a 
checkpoint is to occur, the system automatically keeps all data sets that are in 
use by the job, including the checkpoint data set. 


If end-of-volume is encountered while writing a checkpoint on a direct access 
or tape volume, message IHJO001 (invalid checkpoint) is issued. Control is 
returned to you with a X'08' return code in register 15 and a reason code of 
027 in register 0. 


Examples of DD statements for the checkpoint data set are: 


//adname DD DSN=dsname , UNIT=TAPE, DISP= (MOD, KEEP) 


//ddname DD DSN=dsname,UNIT=SYSDA, 


17. 
// 


DISP= (NEW, DELETE, KEEP) ,SPACE=(CYL, (15,17)), 
VOL=SER=CKPTDS 
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Using Checkpoint Data Sets 


The checkpoint data set must be empty at the beginning of a job step. For more 
information, see ‘““How Checkpoint Entries Are Written” on page 16. 


Using a Generation Data Set as a Checkpoint Data Set 


You must take precautions when using the generation data set as the checkpoint 
data set. For a deferred checkpoint restart, the checkpoint data set, as specified in 
the SYSCHK DD statement, is allocated to the initiator, and an entry is made in 
the initiator’s generation data group name table (GDGNT). The GDGNT is never 
changed, unless a new level is cataloged, and remains for the life of the job and any 
future restarts under the same initiator. The GDGNT uses a data set from the 
same generation data group and uses the original contents of the GDGNT to obtain 
the qualified name of the generation data set. For more information, see 
“Generation Data Sets” on page 52. 


After entry into the GDGNT, if the levels of the generation data group change in 
the catalog and the checkpoints are a different generation, the desired generation at 
restart time may not be the one retrieved as the checkpoint data set. To avoid this 
problem, do not make changes to the number of existing generations for the life of 
the IPL after the checkpoint data set is used for deferred restart. Be sure the same 
initiator is not used twice to do restarts from different levels of the same generation 
data group. 


Storage Estimates for Checkpoint Data Sets 


The checkpoint data set may be on any direct access device not shared by another 
process or drive.. The following information can be used as a guideline in 
determining the size of this data set. 


Figure 1 on page 14 contains the size and number of records written when a 
checkpoint is taken. The number of tracks or the amount of tape occupied by the 
checkpoint data set can be determined by applying the number of records and their 
sizes against either the track capacities of the direct access device or the recording 
density and type for the magnetic tape device. 
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Description of Record 

Checkpoint header record (CHR) 

DD name table (DDNT) 

Data set descriptor record (DSDR) 
Problem program image record (PPIR) 
Supervisor record (SUR) 

Subsystem checkpoint record (SSCR) 


SUBSYSTEM data sets 


SPIE 


VSAM 


Figure 1 (Part 1 of 2). Determining the Size of a Checkpoint Data Set 
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Size 
(in bytes) 


400 
400 
400 
4096 


4096 


4096 


20/4096 


4096 





Number 
Required 


1 
D/21 


A/2 


1/SYSIN or 
SYSOUT 
data set 
E/20 


1 





Records Code: 


A = the number of data sets defined in the job step 


B = (Storage allocated - freespace + (X'14' * number of gaps in 


allocated storage))/X'FEC' 


For example: 


Storage allocated = 9000 to 4FFFF 
and 1A000000 to 2FFFFFFF 


Freespace= 
Address 

A200 to A3FF 
32900 to 35FFF 
40000 to 47FFF 


50000 to 19FFFFFF 


1F900000 to 210F8FFF 


28700000 to 29FFFFFF 


Therefore: 


Size 


200 


3700 


8000 


17F9000 


1900000 


3104900 


Number of Gaps 
1 
1 
1 


1 (not allocated; 
counts as a gap) 


1 
1 


6 


B = ((47000 + 16000000) - 3104900 + (X'14' * 6)) / X'FEC' 


B = 130C0 


C = A variable number of records for system and protected data manage- 
ment control blocks. This will be approximately three records 
plus one record for every three DASD data sets and one record for 


every ten non-DASD data sets. 


D = Number of JCL-specified data sets dynamically unallocated at checkpoint. 


E = One for every 90 ESPIEs established at checkpoint. 


Note: Ali values are in hexadecimal. 


Figure 1 (Part 2 of 2). Determining the Size of a Checkpoint Data Set 
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Dynamic Storage 


Checkpoint/Restart’s dynamic storage requirement varies with the resources used Qe 
by the caller. The largest impact is caused by the amount of storage allocated to 
the job and the amount of storage fragmentation. Checkpoint/Restart’s dynamic 
storage is allocated out of subpools 229 and 230. 
Fragmented Storage 
User fragmented storage may result in: 
e Reduced performance of a checkpoint 
e Insufficient storage for a checkpoint to contain a description of the user’s 
storage (checkpoint is not completed). 
How Checkpoint Entries Are Written 
Nonopened DCB Passed to Checkpoint 
e Sequential data sets 
— When a sequential data set’s disposition is NEW or OLD, the checkpoint 
entry is written at the beginning of the data set, even if a previous 
checkpoint call was written on an entry. a 
‘ \ 
— When a sequential data set’s disposition is MOD, the checkpoint entry is ae / 


written after the last.entry existing in the data set. 
e Partitioned Data Sets 


— When the data set is partitioned, each checkpoint entry is a member, and 
its checkid is its member name. After it writes a checkpoint entry, the 
checkpoint routine executes the STOW macro to add the checkid of the 
entry to the directory of the data set. 


— If the data set is partitioned, regardless of its disposition, the checkpoint 
entry is written after the last entry existing in the data set. 


— If an identical checkid already exists in the directory, the related address of 
a member is changed and becomes the address of the new checkpoint 
entry. The initial disposition specified for the checkpoint data set has no 
effect on the STOW operation. 


Opened DCB Passed to Checkpoint 


If your program has a DCB and opens the checkpoint data set for output, the 
checkpoint routine writes a checkpoint entry at the data set’s current position and 
does not close the data set. A user-opened checkpoint data set does not have to be 
closed after taking the last checkpoint for the job step. All the checkpoint entries 
are saved in this case, without specifying DISP=MOD. This provides the ability to 
request a deferred restart from any of the checkpoints. If the data set is 
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as partitioned, the checkpoint routine executes the STOW macro at the completion of 
( each entry. 


e ddname passed to checkpoint 


If your program provides a ddname instead of a DCB, the checkpoint routine 
opens the checkpoint data set the first time it receives that ddname. If your 
program has a different ddname on the next execution of the CHKPT macro, 
the data set for the previous ddname is closed and the data set for the new 
ddname is opened. The open checkpoint data set is closed by step termination. 


For sequential data sets, DISP=OLD, NEW, or MOD has no effect on how the 
checkpoint entries are written on the checkpoint data set. The first time each 
checkpoint data set is used, the checkpoint entry is written at the beginning of 
the file and each following entry is written after the previous entry. 


Partitioned data sets are treated in the same manner as nonopened DCBs 
passed to checkpoint. 


If end-of-volume is encountered during writing of a checkpoint entry on tape, a 
second attempt is made to create the checkpoint entry on another tape if that other 
tape has been allocated. If EOV occurs again before the entry has been completed, 
message IHJO001 (invalid checkpoint) is issued. Control is returned to you with a 
X'08' return code in register 15 and a reason code of 27 in register 0. 


The status (opened or closed) and position of a checkpoint data set remain the 
Pe same at restart as they were after execution of the CHKPT macro instruction that 
( established the checkpoint. 


The first time a checkpoint data set is opened, checkpoint routines cause open to 
request security information for the data set from the operator. 


Note: A checkpoint data set must contain only checkpoint entries. If a program 
attempts to write its own data in a checkpoint data set, message IEC954I will be 
issued, followed by an abend code of 23F. 


How to Ensure Restart with Sequential Checkpoint Data Sets 


Because abnormal termination or system failure may occur while the new entry is 
written (and to ensure that restart at the most recent valid checkpoint is possible), 
a checkpoint entry must not be written over a preceding checkpoint entry. Four 
methods by which you can ensure that restart is possible are suggested in the 
figures below. All four methods use sequential checkpoint data sets. 


Note: Because of the characteristics of partitioned data sets, checkpoint entries 
with dissimilar member names are not written over each other. 


Figure 2 on page 18 shows the use of one sequential checkpoint data set, one data 
control block, and one DD statement (CHECKDD) specifying MOD disposition. 
The user allows the checkpoint routine to open and close the data set each time it 
writes a checkpoint entry. Checkpoint entries are written sequentially in the data 
set. 
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Program je J 
CHKPT CHKDCB 


CHKDCB DCB DDNAME=CHECKDD , MACRF=W,DSORG=PS 


DD Statement 


/ / CHECKDD DD UNIT=TAPE,DISP=(MOD,KEEP) 


Figure 2. Using One Sequential Checkpoint Data Set to Ensure Restart 


An alternative method of using one sequential checkpoint data set, one DCB, and 
one DD statement is to have the your program open the data set and leave it open. 
The disposition may be NEW, OLD, or MOD. 


Figure 3 on page 19 shows a way to alternate data sets when all checkpoints are i 
taken by one CHKPT macro instruction. The data sets are opened by the control a, 
program and identified by two DD statements, CHECKDD1 and CHECKDD2. id 


The data control block initially refers to CHECKDD1. Before the second 
checkpoint, it is changed to refer to CHECKDD2; before the third checkpoint, it is 
again changed to refer to CHECKDD1; and so forth. One data control block can 
be used for two data sets that are not open at the same time. 


An alternative method of using two sequential data sets is to use two DCBs and 
two DD statements specifying NEW or OLD dispositions, and to execute 
alternately two CHKPT macro instructions, each referring to a different data set. 


The method illustrated in Figure 2 saves all checkpoint entries for possible use in 
deferred restart; the method illustrated in Figure 3 conserves auxiliary storage. 
None of the methods requires a particular device type. 
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Program 


DCBD DSORG=PS Define IHADCB (dummy 
* section that defines 
* DCBDDNAM ) 

CSECT Resume original control 
* section 

LA 2, CHECKDCB Establish CHECKDCB as 
* base address for 

USING IHADCB, 2 IHADCB 

XC DCBDDNAM (8) ,DDHOLD Exchange ddname in 

XC DDHOLD (8) ,DCBDDNAM CHECKDCB for ddname 

XC DCBDDNAM(8) ,DDHOLD in DDHOLD 

CHKPT CHECKDCB Open, checkpoint, close 
DDHOLD DC C'CHECKDD1' 


CHECKDCB DCB DSORG=PS , MACRF=(W) , DDNAME=CHECKDD2 
DD Statements 


//CHECKDD1 DD UNIT=SYSDA,DISP=NEW ... 
//CHECKDD2 DD UNIT=SYSDA,DISP=NEW ... 


Figure 3. Using Two Sequential Checkpoint Data Sets to Ensure Restart 


How Checkpoint Entries Are Identified 


Any number of checkpoint entries can be written in a checkpoint data set, and any 
number of checkpoint data sets can be used concurrently. In a sequential 
checkpoint data set, checkids of valid or invalid checkpoint entries in one data set 
should be unique. In a partitioned data set, checkids of valid entries should be 
unique. 


If you specify checkids instead of having the system generate them, incorrect 
duplicates may be specified. The system does not recognize this error. When 
deferred restart at a checkpoint occurs and the checkpoint data set-is-sequential, 
the system searches the data set from its beginning for the specified checkpoint 
entry. It uses the first entry it finds that has the specified checkid. 


If the data set is partitioned, the system searches the data set’s directory to find the 
location of the specified checkpoint entry. If two or more entries having the same 
checkid were written in the data set, the most recent of those entries is the one 
pointed to by the directory, and restart occurs from the most recent entry. 


Checkpoint entries have two identifications—primary and secondary. 
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e The primary identification is the programmer-generated or system-generated 
checkid specified or requested by the CHKPT macro instruction. The primary 
identification is used when a search is made for a checkpoint entry. 


e The secondary identification is identical to the system-generated checkid that 
might have been requested by CHKPT. The secondary identification is then 
used as a base to compute the system-generated checkids of entries written 
after restart has occurred. 


The search of the data set’s directory prevents the system from generating checkids 
that are duplicates of checkids of existing useful entries. 


The control program identifies each checkpoint in a message to the operator; on 
request, it also makes the identification available to your program. In Figure 4, the 
CHKPT macro instruction requests the control program to supply an identification 
(‘S’ parameter) and place it in the 8-byte field named ID. When the checkpoint is 
successfully taken, the program prints the identification as part of a message to the 


programmer. 


CHKPT 
LTR 
BNZ 
PUT 

* 


PHASE2 


MESSAGE DS 


ID DS 
STEPLOG DCB 


CHKDCB DCB 


CHKDCB,ID,'S' 


Take checkpoint 


15,15 Was checkpoint taken 
PHASE2 No, branch to PHASE2 
STEPLOG,MESSAGE Yes, print 
checkpoint ID 
OF 
H'45' Record length 
H'O' Flags 
C'SUCCESSFUL CHKPT AT PHASE2...ID=' 
CL8 
DSORG=PS ,MACRF=(PM) , X 
RECFM=V , BLKSIZE=128, xX 
LRECL=124 , DDNAME=LOGDD 
DSORG=PS , MACRF= (W) , RECFM=U, X 


DDNAME=CHKDD ; 


Figure 4. Recording a Checkpoint Identification Assigned by the Control Program 
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( ~ Coding the CHKPT Macro Instruction 


The CHKPT macro instruction is coded in your program. When the CHKPT 
macro executes, job step information about the user’s program, virtual-storage data 
areas, data set position, and supervisor control is written as a checkpoint entry in a 
checkpoint data set. The point at which this information is saved becomes a 
checkpoint from which a restart may be performed. After the checkpoint entry is 
written, control returns to your program at the instruction following the CHKPT 
macro. 


The CHKPT macro instruction refers to the data control block (DCB) for the 
checkpoint data set. If the data set is not open, the checkpoint routine opens it and 
then closes it after writing the checkpoint entry. If the data set is open, the 
checkpoint routine writes the checkpoint entry, but does not close the data set. 


The checkpoint data set must be on one or more magnetic tape volumes or direct 
access volumes. A checkpoint data set that resides on a magnetic tape must have 
IBM standard labels. Direct access volumes cannot be shared. (For restrictions on 
shared DASD, see ‘“‘Coding the DD Statement for a Checkpoint Data Set’’ on 

page 11.) 


The standard form of the CHKPT macro instruction is: 


CHKPT $dcbaddr | DDNADDR=ddnaddr | CANCEL} 


( i [.{checkid addr | IDADDR=checkid addr} 
, [,icheckid length | IDLNG=checkid length | 
‘S’ | IDLNG=‘S’}]] 


[.MF={S | (S,addr)} | {L | (L,name)} | 
$(E,addr) | (E,addr1,addr2)| 





The options on the CHKPT macro are discussed in alphabetic order below. 


CANCEL 
cancels the request for automatic checkpoint restart. CANCEL can be used 
only with the standard form (MF= omitted or MF=S or MF=(S,NAMB)). 
An automatic checkpoint restart can be suppressed by issuing a CHKPT with 
CANCEL. See “Canceling a Checkpoint” on page 25. 


checkid addr 
specifies the address of a programmer-provided field that is to contain a 
unique, printable identification of the checkpoint entry. This identification is 
called a checkid. The checkpoint routine writes the checkid as part of the 
entry and prints it in a message on the operator’s console when it finishes 
writing the entry. The programmer must use the checkid and code it in the 
JOB statement RESTART parameter to use the corresponding entry to 
perform a deferred restart at a specific checkpoint. If the checkid addr 
operand is omitted, the checkid length or ‘S’ operand is invalid. Valid 
characters for a checkid are: 
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1*);  -/,% __>?:’ =" and blanks are valid except as first 
characters or for partitioned data sets. 


checkid addr is an RX type address or register (2-12). 


checkid length or ‘S’ 
checkid length is the length in bytes of the field that contains the checkid. 
The maximum length of this field is 16 bytes when the checkpoint data set is 
physical sequential, 8 bytes when it is partitioned. By coding this operand or 
by omitting it entirely (in which case a length of 8 bytes is implied), you 
specify that your program will form an identification and store it in the 
checkid field before CHKPT is executed. If the checkid addr operand is 
omitted, this operand is invalid. 


By coding this operand as ‘S’, you specify that the checkpoint routine is to 
generate an identification 8 bytes in length and store it in the checkid field. 
If the checkid addr operand is omitted, this operand is invalid. 


If both checkid addr and checkid length or ‘S’ and the keyword equivalents 
are omitted, the checkpoint routine generates an identification and writes it 
in the checkpoint entry and on the operator’s console, but does not return it fos 


to your program. . Z 


If you provide the checkpoint identification and the checkpoint data set is 
sequential, the identification can be any combination of up to 16 
alphamerics, special characters, and blanks. For a partitioned data set, it 
must be a valid member name of up to eight alphamerics. The identification 
for each checkpoint should be unique. If two identifications differ only by 
having a different number of trailing blanks, the control program considers 
them to be the same. (For further information, see ““How Checkpoint 
Entries Are Identified” on page 19.) When a deferred step restart takes 
place, this number is reset to 0. 


One way to use the checkid addr operand, or its keyword equivalent, is to 
allow your program to select fields in the records of an input data set and use 
them as checkids. Alternatively, your program may use the checkid addr and 
the ‘S’ operands and include a system-generated checkid in the current 
record of an output data set. 


dcbaddr 
is the address of the DCB for the checkpoint data set. dcbaddr is an RX type 
address or register (2-12). 


DDNADDR= 
Points to an 8-character, fixed-length field that has the name of the DD 
statement for the checkpoint data set. (This keyword may be used instead of 
the first positional operand DCB.) This keyword is mutually exclusive with 
the positional operand DCB. 
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ddnadar is an RX type address or register (2-12). 


IDADDR= 
Points to a variable-length field that contains, or receives, the checkid. (This 
keyword may be used instead of the second positional operand, checkid 
addr.) This keyword is mutually exclusive with and follows the same rules as 
the second positional operand. 


checkid addr is an RX type address or register (2-12). 


IDLNG= 
Specifies the length of the field that contains, or receives, the checkid. (This 
keyword may be used instead of the third positional operand, checkid length.) 
This keyword is mutually exclusive with and follows the same rules as the 
third positional operand. 


Keyword MF= Standard Form (default) 


MF=S 
Generates an inline parameter list and executable code. 


MF=(S,name) 
Generates executable code and an inline parameter list with the label name 
at the beginning of the list. name is a symbol. 


Keyword MF= List Form 


MF=L 
Generates a parameter list. (symbol is required with this form.) 


MF=(L,name) 
Generates a parameter list with the label name at the beginning of the list. 
name is a symbol. 


Keyword MF= Executable Form 


MF=(E,addr) 
Generates executable code reference to the parameter list whose name is 
addr. addr can also be expressed as a register, for example, MF=(E,(RS)), if 
the register is preloaded to point to a parameter list. 


MF=(E,addr1 ,addr2) 
Generates executable code that moves a model parameter list from addr2 to 
addrI and updates the list in addr1 with the positional or keyword operands 
specified. addr1l and/or addr2 may be expressed in register form, for 
example, MF=(E,(R5),(R6)) if the register or registers are preloaded to 
point to the respective parameter list. This form is useful for programs that 
must be reentrant. 


For further information on the MF= keyword, see Data Administration: Macro 
Instruction Reference. 
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CHKPT Return Codes 


The CHKPT macro returns a code in register 15 to indicate whether the CHKPT 
macro instruction executed successfully. If the value in register 15 does not equal 
0, register 0 gives a more detailed code. Appendix A, ‘‘Checkpoint/Restart 
Codes” on page 71, contains a list of these codes and their meanings. 


Use of CHKPT with Other Macro Instructions 


EXTRACT: The EXTRACT macro obtains information from the task control 
block (TCB). TCB information may change when the task terminates and the job 
step restarts. If the information is needed after restart, the EXTRACT macro 
instruction should be reissued after the checkpoint is taken, as shown in Figure 5. 


EXTRACT ANSADDR, FIELDS= (ALL) Obtain TCB 
* information 
CHKPT CHKPTDCB Establish 
* checkpoint 
CH 15,=H'4' Is restart in 
* progress 
BNE NRESTART _. No, branch to 
* NRESTART 
EXTRACT ANSADDR, FIELDS= (ALL) Yes, obtain new 
* ; information 


NRESTART 


Figure 5. Obtaining Updated TCB Information after Restart 


SETPRT: The SETPRT macro in data management selects the universal character 
set (UCS) buffer for an IBM 3203-5, 3211, 4245, 4248, or 1403 Printer with the 
Universal Character Set feature and the forms control buffer (FCB) for the 3203-5 
or 3211 Printers, 3262 Model 5, or the 3800 Printing Subsystem. 

For 3800, 3262, 4248, and 4245 Printers, the buffer contents are not saved when a 
checkpoint is taken. To ensure that all lines are printed before taking a checkpoint, 
you should take one of the following actions: 

e Close the printer with the CLOSE macro instruction. 


e Issue the ‘“‘Clear Printer’? command. 


¢ Reprint the data after restart. 
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For the 3800 Printing Subsystem, (Model 1 or Model 3 in compatibility mode), the 
SETPRT macro instruction initially sets or dynamically changes the control 
information for the printer. For additional information on the SETPRT macro, see 
IBM 3800 Printing Subsystem Programmer’s Guide and Data Administration: Macro 
Instruction and Reference. 


For the 3800 Printing Subsystem (Model 1 or Model 3 in compatibility mode) at 
checkpoint restart, the job has the JECBE parameters as modified during the last 
JFCBE exit. If no exit was requested or if the JFCBE was not flagged as being 
modified during the exit, the JFCBE reflects the values coded in the restart JCL. 
For additional information on the JECBE exit, see Data Administration Guide 


WTOR: The reply toa WITOR macro instruction must be received before a 
CHKPT is issued. 


STIMER: A time interval established by the STIMER macro must be completed 
before a CHKPT is issued. 


ATTACH: If ATTACH is issued in the program using CHKPT, all subtasks 
created must terminate before a CHKPT is issued; the job-step task must be the 
only task of the step. 


SETDEV: If an IBM 3890 Document Processor unit-record device is used, the 
following SETDEV macro instruction must be issued after a successful restart: 


3890 device: SETDEV=dcbaddr, DEVT=3890, IREC=irecaddr 
[,PROGRAM=progname| 


where: 


dcbaddr 
is the address of the associated DCB. 


irecaddr 
is the address of an initialization record. 


progname 
is the name of the stacker control instruction program loaded in 
SYS1.IMAGELIB (this parameter is optional). 


PCLINK: If a PCLINK STACK is issued, a PCLINK UNSTACK must also be 
issued against all existing stacks before a CHKPT can be issued. There must not 
be any PCLINK STACKs in existence when a CHKPT macro is issued. 


Canceling a Checkpoint 


There are certain conditions under which you may not wish to have a job 
automatically restarted in case of failure. 


e If you have not yet taken a checkpoint and the job has already been restarted. 


e If you are updating critical data that must be complete and accurate before an 
automatic restart can be successful. 
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In these cases, you can take a checkpoint, but avoid automatic restart, by using the 

CANCEL option on the CHKPT macro. If CHKPT CANCEL is used after a f 
checkpoint is taken, the job is not automatically restarted after failure. For the eat 
format of the CHKPT macro, see “‘Coding the CHKPT Macro Instruction” on 

page 21. | 


After being restarted, the job step may again terminate abnormally. If it does, it 
may be restarted from the same checkpoint, subject to operator authorization. To 
avoid restarting the job step twice from the same checkpoint, the sequence shown 
in Figure 6 may be coded. 


CHKPT CHKPTDCB Establish checkpoint 

CH 15,=H'4' Is restart in progress 

BNE NRESTART No, branch to NRESTART 
CHKPT CANCEL Yes, cancel restart request 


NRESTART 


Figure 6. Canceling a Request for Automatic Restart 


After the successful initiation of a checkpoint restart, the system places a return (~ % 
code of hexadecimal 04 in register 15 and returns control to your program at the es 
instruction following the CHKPT macro instruction. At this time, a request for 

another automatic restart at the same checkpoint is normally in effect. In 

Figure 6, the instruction that follows the CHKPT macro instruction tests the 

return code to determine whether control has been returned as the result of a 

restart. If the return code is 04, a restart has just occurred, and a second CHKPT 

macro instruction is executed. This macro instruction has a CANCEL operand, 

which cancels the existing request for an automatic restart. If the job step again 

terminates abnormally after a restart from the checkpoint, automatic restart can 

occur only at the beginning of the step. It will not occur at the checkpoint 

preceding the canceled checkpoint. 


Shared and Serially Reusable Resources 


Requesting Serially Reusable Resources 


When a job step terminates, it loses control of serially reusable resources. If the 
Step is restarted, it must request all the resources needed to continue processing. 
Explicit use of a serially reusable resource is requested when the user’s program 
issues the ENO macro instruction. If the program issues the ENQ and takes a 
checkpoint, it must issue the ENQ again whenever restart occurs at the checkpoint. 


om 
oe: 
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Figure 7 shows a program that requests a serially reusable resource by issuing an 
ENQ before establishing a checkpoint. After the checkpoint, it tests for a restart. 
If one has occurred, it requests the same resource again. It requests the resource 
again because the job step terminated, lost control of the resource, and then 
restarted from the checkpoint. 





ENQ (QADDR, RADDR) 


CHKPT CHKPTDCB 

CH 15,=H'4' 

BNE NRESTRT 

ENQ  (QADDR,RADDR) 
NRESTRT 


DEQ (QADDR, RADDR) 


Figure 7. Requesting a Resource after Restart 


(_ - Some serially reusable resources are requested implicitly by issuing data 
management macro instructions. These resources may be records that you are 
processing or tracks on a direct access device. To ensure correct processing, you 
must not establish checkpoints during control of these resources: 


e If the basic direct access method (BDAM) is used, your program must, before 
executing the CHKPT macro, execute either the WRITE or the RELEX macro 
to release a record that is read with exclusive control. 


e If BDAM is used to add a record to a data set with variable-length or 
undefined records, BDAM issues an ENQ macro instruction for the capacity 
record (RO). Your program must execute the WAIT or CHECK macro to 
check completion of the write operation before it executes CHKPT. After 
execution of the WAIT or CHECK macro, the resources are dequeued. 


e If the basic indexed sequential access method (BISAM) is used, a checkpoint 
must not be taken before completion of a write operation. If a record is read 
for update, a checkpoint must not be taken before writing the updated record 
and waiting for the write operation to be checked. 


e If the queued indexed sequential access method (QISAM) is used, and if a 
SETL macro instruction was issued, an ESETL macro instruction must be 
issued before taking a checkpoint. Another SETL macro instruction may be 
issued after the checkpoint. 
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¢ Ifa VSAM cluster is implicitly specified, the restart program obtains the names 
of the data set and the index from the catalog and reissues an ENQ macro 
against each of them; therefore, no special considerations are required. 


Tape Volume DEQ at Demount 


Tape volume DEQ at demount is not affected by deferred or automatic step 
restarts. If restart is from a checkpoint, all volumes that are dequeued during the 
processing of the job are enqueued when the restart job is initiated. This includes 
all volumes referenced in the JFCB at the time of checkpoint. Subsequent restart 
processing tests each tape data set open at the time of checkpoint to determine 
whether the following conditions are satisfied: 


1. The data set was open for OUTPUT, OUTIN, EXTEND, OUTINX, or 
INOUT when the checkpoint was taken. 


2. JFCDQDSP was set to 1 in the JFCB at the time of checkpoint. 
3. The program restarting is APF-authorized. 


If these conditions are satisfied, the DEQ at demount is reestablished so that the 
data set’s current volume is dequeued when it is demounted. All volume serials in 
the JFCB and JFCB extensions prior to the current volume are dequeued during 
the restart process for that data set. 


Because all volumes, including those dequeued through the DEQ at demount, are 
enqueued at restart, if only for a short time, automatic checkpoint restart may 
result in a “waiting for volumes” condition if any of the dequeued volumes are 
allocated to another job. A deferred checkpoint restart can be timed to avoid such 
a delay. 


Special Operating System Features 


Shared DASD 


At some installations, a direct access storage device is shared by two or more 
independent computing systems. This device is a serially-reusable resource. If it is 
being used when a checkpoint is taken, it must be requested after a restart from the 


checkpoint. This resource is requested by a special macro, RESERVE, described in 


Supervisor Services and Macro Instructions. For restrictions on shared DASD, see 
“Coding the DD Statement for a Checkpoint Data Set” on page 11. 


When using dynamic allocation, the following should be considered: 


e For data sets that are not opened during the original execution when 
nonspecific tape volumes were requested, the volumes assigned at restart may 
not be the same as those the referenced DD initially assigned. (This occurs 
when the DD statement specified VOL=REF= to a DD that was unallocated 
at the time the checkpoint was taken.) 


e« Ifa VIO data set is dynamically allocated prior to a checkpoint and unallocated 
after checkpoint, the step is ineligible for restart until the next checkpoint is 
taken. 
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VIO Data Sets 


Dynamic Allocation 


Shared Resources 





If a VIO data set is open when the checkpoint is taken, it is supported for 
automatic restart only. 


If restart is deferred, VIO data sets for BsAM or QSAM must be redefined as 
dummy data sets. If they are not redefined or if an access method other than 
BSAM or QSAM accesses the VIO data set, the job fails. 


Checkpoint/Restart supports the use of dynamic allocation by problem programs. 


Checkpoint/Restart is supported for the local shared resources feature, but 
repositioning is not allowed. Also, checkpoint is not allowed and cannot be taken 
when the global shared resources feature is used. 


For automatic checkpoint restart, deleting a data set open at the time the 
checkpoint was taken or deallocating a SYSOUT data set to the job entry 


subsystem makes the step ineligible for restart until the next checkpoint is taken. 


For automatic step restart, any of the following makes the step ineligible for 
restart: 


« KEEP, CATLG, or UNCATLG a NEW JCL specified data set. 


e DELETE JCL specified data set whose initial status upon entry to the step was 
OLD. 


e DELETE an OLD dynamically allocated data set that had volumes specified 
when allocated. 


e UNCATLG JCL specified data set whose initial status upon entry to the step 
was OLD and did not have volumes specified. 


Cross-Memory Support Restrictions 


A checkpoint is not allowed when storage access across address space boundaries 
has been established. When using cross-memory support, the restrictions shown in 
Figure 8 on page 30 may apply. 
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Restriction Restriction Lg 

Macros Code Instructions Code as 
LXRES 2 EPAR 2 
ETCRE 2 ESAR 2 
ETCON 2 IAC 2 
AXRES 2 IVSK 2 
AXEXT 2 MVCP 3 
AXSET 2 MVCS 3 
ATSET 2 MVCK 3 
PCLINK 1 PC* 4 

Pr 3 and 4 

SAC 1 

SSAR 1 
Code Definitions: 
1. You must release this resource before checkpoint and reestablish it 

after checkpoint or restart. 
2. You must reestablish this resource after a restart. 
3. You must be sure that the keys and/or ASIDs are correct after a restart. 
4. Any attempt to restart from a checkpoint taken in a program call (PC) routine = = 
is unpredictable. If a PC routine issues a PCLINK, the error is detected and a & 


checkpoint request is refused. It is impossible to detect a PC routine when the 
PC routine does not issue a PCLINK and the PC routine is entered without a 
space switch. If a PC routine issues a checkpoint, that PC routine must ensure 
that all the requisite values needed for a PT instruction are current before 
issuing the PT instruction. These values may change after a restart (for 
example, the ASID might be different). 


Figure 8. Cross-Memory Support Restrictions 
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Chapter 3. Requesting Restart 


This chapter explains how you may request restart. The topics discussed are: 


Automatic Restart 


Automatic restart 

— Automatic checkpoint restart 
— Automatic step restart 
Deferred restart 

— Deferred checkpoint restart 
— Deferred step restart 


Coding the RD and RESTART parameters 


Because an automatic step restart and an automatic checkpoint restart are similar, 
they are discussed together in the following sections. 


Operator Message Sequence 


During processing related to an automatic checkpoint restart, the system issues the 
following sequence of messages to the operator: 


dy 


A message is issued each time a checkpoint entry is written. Each message 
contains the checkpoint identification. 


If the job step terminates because of an abend condition, an abend message is 
issued for the job step. 


If the abend code makes the job step eligible for restart and a job journal is 
present, an authorization for restart message is issued that requires a reply. 


A message is issued, indicating the virtual storage requirements (beginning 
address and ending address) of the job step to be restarted. 


Normal mount and restart messages are issued. 
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6. If password-protected data sets are to be repositioned at restart time, a 
password message is issued. o 


7. If additional checkpoint data sets (other than the data set used for restart) are 
encountered at restart time, checkpoint data set security messages are issued. 


8. A successful restart message is issued. 


During processing related to an automatic step restart after a job step has 
terminated abnormally, the sequence is the following: 


1. An abend message is issued for the job step. 


2. If the abend code makes the job step eligible for restart, and a job journal is 
present, an authorization message is issued that requires a reply. 


3. Normal mount messages are issued. 


Note: The abend message, which is issued as: 


IEF4501 jobname.stepname.procstepname ABEND code 





is always displayed if a job step terminates abnormally. In addition, if the job step 

is being executed and the system fails, this message will be displayed during the 

next IPL if a system-supported restart is performed. The code part of the message eo 
has the form Shhh (S followed by a 3-character hexadecimal number) if the system - Be 
executed the abend macro instruction, or Udddd (U followed by a 4-digit decimal 

number) if your program executed the abend. It is S2F3 if a system failure 

occurred. For additional information on messages, see System Messages. 


Operator Considerations 


If a step requests automatic restart and is eligible for restart, the system displays 
the following message to request authorization for the restart: 


xXxIEF225D SHOULD jobname.stepname.procstepname 


[checkid|] RESTART 





Checkid appears in the message only if restart at a checkpoint is requested. It 
contains from 1 to 16 characters and identifies the checkpoint entry to be used to 
perform the restart. The operator must reply to the request for authorization as 
follows: 


REPLY xx,{‘YES’ | ‘NO’ | ‘HOLD’} 
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YES authorizes the restart, HOLD postpones it, and NO prohibits it. 





If the advisability of allowing the restart is not readily apparent, the operator 
should reply HOLD to the authorization message. Later, if restart should be 
allowed, initiate the restart by using the RELEASE command, thereby achieving 
the same result as with an initial YES reply. If the decision is to deny the restart 
authorization, the operator can cancel the job in the HOLD queue. However, that 
HOLD, as well as YES, causes special disposition processing to occur during the 
abnormal termination. This processing keeps all OLD (or MOD) data sets and 
deletes all NEW data sets if a step restart is requested and keeps all data sets if a 
checkpoint restart is requested. If you decide to disallow the restart but want to 
allow normal disposition (as requested on the job’s DD statements) of data sets 
that were kept, release the job, wait until restart has begun, and then cancel the 
job. 


Ifa job is canceled in the HOLD queue, any paging space allocated to a VIO data 
set is not released until the system is reinitialized. To avoid this situation, the 
operator should release the job, wait until the restart begins, and then cancel the 
job. 


After the authorization request and before the operator replies YES, using the 
VARY and UNLOAD commands may cause the system’s volume and device 
configuration during a restart execution of the job to be different from what it is 
during the original execution of the job. The operator may eliminate use of 
defective volumes and devices. 


The ability to use a different volume usually exists only in the case of a NEW data 
ots set on a nonspecific volume. If a checkpoint restart is performed, the data set must 
( not be open at the checkpoint. The ability to use a different device does not apply 
7 to the device or devices containing the SYSRES, SPOOL, or PAGE packs. Ifa 
checkpoint restart is to be performed and a data set is open at the checkpoint, the 
same type of device must be allocated to the data set during both the original and 
restart executions. 


Note: When an initiator selects a job for automatic step restart and the job is 
reinterpreted, no message is issued to the operator regarding virtual storage 
requirements, because its execution is not location dependent. 

If ADDRSPC=REAL is specified on the JOB or EXEC statement, the restart may 
be delayed by the system waiting for the allocation of storage. If another job is 
using the required storage, you do not receive a message—only a delay. Enter a 
DISPLAY A command to see if a system task or another job is using the storage 


required by the job with a nonpageable region. You may stop or cancel the 
conflicting task or job. 


How to Request Automatic Checkpoint Restart 
An automatic checkpoint restart occurs if: 
e The job journal option is specified, or 
¢« The step requests restart (RD=R), and 


{ e A successful checkpoint has been taken, and 
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e The step is eligible for restart because it was terminated by an abend macro 
that returned an eligible completion code (specified by the CKPTREST macro 
at system generation), or because system failure occurred. 


e« The operator authorizes the restart. This authority enables the operator to 
control the number of restarts of the same step or from the same checkpoint. 


If a step fails and an automatic checkpoint restart is requested, restart occurs 
automatically at the last checkpoint taken. 


Execution of the CHKPT macro requests this type of restart and establishes the 
checkpoint. You must provide an ordinary DD statement for the checkpoint data 


set. 


Figure 9 illustrates a job requesting automatic restart at a checkpoint. 


/ /MYJOB JOB MSGLEVEL=1! 

//STEP 1 EXEC 

//STEP2 EXEC PGM=MYPROG MYPROG issues the CHKPT macro 
/ /NAME 1 DD DSN=NAME2 Describes the data set into 
//* which checkpoint entries 

//* are to be written 


Figure 9. Requesting Automatic Checkpoint Restart 


Note to Figure 9: 


1 =MSGLEVEL=1 is optional. 


Automatic Step Restart 
An automatic step restart occurs if all of the following are satisfied: 
e The step requests restart (RD=R or RD=RNC); and 
e The step is eligible for restart because it was terminated by an abend macro 
that returned an eligible completion code (specified by the CKPTREST macro 


at system generation), or because system failure occurred; and 


e The operator authorizes the restart. This authority enables the operator to 
control the number of restarts of the same step or from the same checkpoint. 


If a step fails and automatic step restart is requested, restart occurs automatically at 
the beginning of the step that failed. 


Automatic step restart may be requested by coding the RD parameter (RD=R or 
RNC) on the JOB or EXEC statement in the originally submitted job deck. 
Checkpoint processing is suppressed if RD=RNC. 
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Figure 10 illustrates the JCL of a job requesting automatic step restart. 


/ /MYJOB JOB MSGLEVEL=1!,RD=R Requests automatic 


//* restart at the 

hai beginning of any step 
//* that terminates 

f/* abnormally 

//STEP 1 EXEC 

//STEP2 EXEC PGM=MYPROG, RD=R? Requests automatic 

//* restart of STEP2 if it 
//* terminates abnormally 


//STEP3 EXEC 


Figure 10. Requesting Automatic Step Restart 


Notes to Figure 10: 
1 MSGLEVEL=1 is optional. 


2 Note that if RD=R appears on the JOB statement, it is not required on the 
EXEC statement. 


How MOD Data Sets Are Handled during Automatic Step Restart 





When automatic step restart is requested, the system saves, for each MOD data set 
that is on a direct access volume and used by the step, the TTR (and track balance) 
of the end of the data set. Saving occurs when each data set is first opened. If 
restart occurs, the saved TTRs indicate the ends of the data sets when the data sets 
open again. Thus, if the step writes data in such a data set during the original 
execution, the step writes over the data during the restart. The action described 
here does not occur if restart at a checkpoint occurs. 


If a MOD data set on tape is used in the restart step, the data set is not 
repositioned at the start of the restart execution. Data written into it during the 
restart execution follows the data written during the original execution. You may 
reposition the data set so the data written during the restart execution overlays the 
data written during the original execution. 


If the data set opens with the OUTINX or EXTEND options, and its DD 
disposition parameter was 


DISP=NEW: there are no restrictions, because the data set reallocates prior to the 
restart. 


DISP=OLD or DISP=SHR: the data set repositions after the last record added 
before the job terminated. It is possible to add records twice to the data set. 


DISP=MOD: the following restrictions apply to a multivolume data set: 


e If the volume sequence doesn’t point to the last volume, records overlay or an 
abend occurs. 
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e If the volume sequence is not supplied, loading starts on the last volume on 
which the data set exists. For TAPE data sets only, if the end of data occurs 
on a volume other than the last allocated volume, processing starts on the ‘ 
wrong volume. 


ay 
va 


Caution Concerning Automatic Step Restart after a Checkpoint Restart 


If a step executes as the result of an automatic or a deferred checkpoint restart, and 
if you attempt an automatic step restart of this step, and if the JCL of the step 
refers to any new data sets on direct access volumes, the attempt may be 
unsuccessful. When the step is initiated during the checkpoint restart, the failure 
occurs because all the step’s data sets that have a NEW disposition are changed to 
a disposition of OLD by the system. | 


When the special disposition processing that prepares for a step restart occurs, all 
data sets used by the step appear as OLD and are kept. When the step restart 
occurs, the scheduler tries to obtain space for data sets specified as NEW in the | 
JCL for the step. If the attempt for data set space is made on the volume that 
already contains the data set, the failure occurs because of the apparent presence of 
a “duplicate DSCB on direct access volume.” 


JCL Requirements and Restrictions on Automatic Restart 


To allow an automatic step restart or an automatic checkpoint restart, you must 
observe the following rules when preparing the job deck used in the original 


execution: me 

a 

1. If astep restart is desired, the RD parameter must be coded to request the Set 
restart. 


2. If acheckpoint restart is desired, a DD statement for the checkpoint data set 
must be included in the step that executes the CHKPT macro instruction. 


3. The EXEC statements in the job deck must have unique names. (Upon restart, 
the system searches for a named step.) 


4. If commands are in the original deck, the commands are not reexecuted when 
restart occurs. 


5. The DD statement VOL=REF parameter is ignored if restart is attempted 
from a checkpoint taken when the DD statements are opened out of order and 
the referenced DD statement requested nonspecific tape volumes. 


Resource Variations Allowed in Automatic Restart 


The system’s device and volume configuration during a restart execution of a job 
can differ from the status during the original execution of the job. 


The ability to use a different volume exists only in the case of a new data set ona 
nonspecific volume. If a checkpoint restart'is to be performed, the data set must 
not be open at the checkpoint. The ability to use a different device does not apply 
to the device or devices containing the SYSRES volume and the LINKLIB data set. 
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Also, if a checkpoint restart is to be performed, the same device type must be 
(C allocated to the data set during both the original and restart executions. 


How the System Works at Automatic Restart 


How Data Set Disposition Is Determined: When a step requests restart and is eligible 
for restart, disposition processing of the data sets used by the step or by the job 
does not occur until the operator replies to the request for authorization. If the 
operator denies restart, disposition processing occurs normally and 
programmer-specified final or conditional dispositions are performed. If you 
indicate that a step execute after abnormal termination, the step is executed. The 
next step in a job that will be executed based on abnormal termination will not 
execute if a restart is successfully performed. 


e If step restart is to occur, all data sets with OLD or MOD dispositions in the 
restart step and all data sets passed around the restart step are kept, even if 
they are declared temporary. Temporary data sets normally cannot be kept. 


e All data sets with NEW dispositions in the restart step are deleted. 


e If acheckpoint restart is to occur, all data used by the job (data sets that were 
not previously disposed of) is kept. 


If the operator authorizes restart, any step to be executed after abnormal 
termination will not be executed, because the results appear as if abnormal 
termination did not occur. 


( If the operator performs an operator-deferred restart by replying HOLD to the 

hae request for authorization, a CANCEL command for the job may be issued instead 
of a RELEASE command. If CANCEL is issued, no further data set disposition 
processing or step executions occur. The disposition of these data sets remains as it 
was when the HOLD was issued. 


How the Job Deck Is Reinterpreted and the SWA Merged: After disposition 
processing is completed, the job is reenqueued by the job entry subsystem and is 
eligible for selection. After the job is selected, the system begins restart processing 
by reinterpreting the job deck and creating a new scheduler work area (SWA) 
containing the required job-related information. The system uses an internal 
representation of the original job deck to perform this function. The job is not 
read in again. 


After SWA is re-created, its contents are updated by the system with information 
saved on the job journal. If automatic checkpoint restart is performed, the SWA is 
again updated with information that is saved in the last valid checkpoint entry on 
the checkpoint data set. 


When the information is merged and step restart occurs, the SWA appears the same 
as it did before the original execution of the restart step. If checkpoint restart 
occurs, the SWA differs from its original form in the following ways: 


e Except for data sets not opened during the original execution that requested 


nonspecific tape volumes, data sets specified as NEW in the restart step have 
{> their dispositions changed to OLD. 
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e« In the case of data sets for which nonspecific volumes are requested in the 
restart step, the work queue entry describes the device type and serial numbers 


: : i : fo™ 
of the volumes assigned to the data sets during the original execution. 


e Inthe case of multivolume data sets, the work queue entry indicates which 
volumes are processed at the checkpoint. These volumes, and not the first 
volumes of the data sets, are mounted (if they have not remained mounted) 
during the restart. For VSAM, however, volume mounting is based on the 
present catalog information and may result in the mounting of unneeded 
volumes; for example, the first volume of a sequential data set may be 
temporarily mounted although it is no longer required at the checkpoint being 
restarted and may be demounted almost immediately. This situation may be 
avoided through specification of the parallel mount subparameter of the DD 
statement UNIT parameter (UNIT=(,P). For further information, see the 
publication JCL. 


e The system-generated name for any temporary data set uses the time stamp 
obtained when the job was originally interpreted. 


In addition, any modification made to the job’s environment by the use of the 
dynamic allocation facilities prior to the last checkpoint reflects in the re-created 
SWA. The SWA appears as it did at the time of checkpoint. 


How a Step Restart Is Initiated 


A step restart is initiated in the same way as it is during a normal execution. The 

devices allocated to the restart step can be different (but of the same device type) 

from the devices allocated originally. If the allocated devices differ, volumes must a 
be moved from one device to another. If automatic volume recognition (AVR) is \ 
used and if the devices are available for allocation, devices containing the required ~ 
volumes are allocated. 


Unless the volumes are already mounted, normal mounting messages request the 
operator to mount the required volumes on the devices after devices are allocated 
to the restart step. The requested volumes resume processing. 
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Deferred Restart 


( 


Message Sequence 


To perform a deferred checkpoint restart, the job to be restarted is resubmitted in 
an input job stream. Messages that contain checkpoint entry identifications are 
displayed on the console during the original execution of the job and thus may be 
used by the programmer preparing the job for resubmission. When the resubmitted 
job is restarted, messages appear on the console in the following sequence: 


1. A message asking if the checkpoint data set (used for restart) is a secure 
volume. 


2. If additional checkpoint data sets (other than the data set used for restart) are 
encountered at restart time, checkpoint data set security messages are issued. 


3. A message indicating the virtual-storage requirements of the job. 
4. Normal mount messages. 


5. If password-protected data sets are to be repositioned at restart time, a 
password message is issued. 


6. A successful restart message. 
To perform a deferred step restart, the job to be restarted is resubmitted. Normal 


mount messages are displayed. A message indicating the file protection status of 
the checkpoint/restart volume is not displayed. 


Operator Considerations 


When a job is resubmitted to perform a deferred checkpoint restart (the RESTART 
parameter is coded on the JOB statement with a checkid operand), the processing 
is essentially the same as that during an automatic checkpoint restart after the 
restart reader has reinterpreted the job. A message is issued to the operator, 
indicating the virtual-storage requirements of the job. 


The required virtual-storage area can also be unavailable for the following other 
reasons: 


e The REGION size parameter for the step is larger when the job is resubmitted 
than in the original execution, and the area required is at different addresses 


than the area available. 


e A new IPL is performed and, because of different IPL options specified by the 
operator, the area required is larger than the area available. 


If these conditions exist, a message is displayed, indicating that virtual storage for 
the job step to be restarted is unavailable. 
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Deferred Step Restart 


You cause a deferred step restart of a job by coding the RESTART parameter on . 
the JOB statement and resubmitting the job. The parameter specifies a job step or | 


a step of a cataloged procedure. It restarts the job at the beginning of the specified 
step. Steps preceding the restart step are interpreted, but not initiated. 


The CHKPT macro instruction may or may not be coded in your program. 
Figure 11 illustrates a job originally submitted and the same job resubmitted for 
step restart. Assume that the results of STEP2 were unsatisfactory because of 
abnormal termination or incorrect data when the job was executed originally. 


/ /MY JOB 
//* 
//STEP1 


//STEP2 


//STEP3 


/ /MYJOB 


// 
//STEP1 


//STEP2 


//STEP3 


Figure 11. 


| 


Original Deck 


JOB MSGLEVEL=1 ! 
EXEC 
EXEC PGM=MYPROG 
EXEC 


Resubmitted Deck 


JOB MSGLEVEL=1, ! 
RESTART=STEP2 

EXEC 

EXEC PGM=MY PROG 

EXEC 


Requesting a Deferred Step Restart 


Note to Figure 11: 


1 MSGLEVEL=1 is optional. 
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JCL Requirements and Restrictions on Deferred Step Restart 


To perform a deferred step restart, you must provide the data set environment 
required by the restart job. This may be accomplished by using the conditional 
disposition subparameter in the appropriate DD statements during the original 
execution of the job. Conditional dispositions in the original deck must be used to: 


Delete all NEW data sets used by the step to be restarted. 


Catalog all data sets passed from steps preceding the restart step, to the restart 
step, or to steps following the restart step. Abnormal termination of the restart 
step, when it is originally run, causes cataloging of the passed data sets. The 
information is available to the following steps when the deck is resubmitted. 


Keep all OLD data sets used by the restart step, other than those passed to the 
step. 


If a MOD data set on tape is used in the restart step, the data set is not 
repositioned at the start of the restart execution, and data written into it during the 
restart execution follows the data written during the original execution. You may 
want to reposition the data set so the data written during the restart execution 
overlays the data written during the original execution. 


Any data sets that are dynamically deallocated have the disposition specified at the 
time deallocation occurred. Conditional disposition processing is not done during 
abend. . 


The following rules apply to the restart deck: 


1. 


2. 


Code the RESTART parameter on the JOB statement. 


If data sets are passed from steps preceding the restart step to the restart step 
or to steps following the restart step, the DD statements receiving the data sets 
must entirely define the data sets. They must explicitly specify volume serial 
number, device type, data set sequence number, and label type, unless this 
information can be retrieved from the catalog. It is recommended that passed 
data sets be conditionally cataloged during abnormal termination of the original 
execution. Label type cannot be retrieved from the catalog. 


Generation data sets created and cataloged in steps preceding the restart step 
must not be referred to in the restart step or in steps following the restart step 
by the relative generation numbers used to create them. They must be referred 
to by their actual relative generation numbers. For example, a data set created 
as the +1 data set must be referred to as the 0 data set (assuming that the +2 
data set was not also created). 


The EXEC statement PGM and COND parameters and the DD statement 
VOL=REF parameter must not be used in the restart step or in steps following 
the restart step if they contain values of the form stepname or 
stepname.procstepname, referring to a step preceding the restart step. 


The DD statement VOL=REF parameter is ignored if restart is attempted 


from a checkpoint taken when the DD statements are opened out of order and 
the referenced DD statement requested nonspecific tape volumes. 
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Resource Variations Allowed in Deferred Step Restart 


Deferred Checkpoint Restart 


A deferred step restart only allows the restarted execution of a job to begin at K | ) 
other than the first step of the job. Job step initiation and allocation of resources ~ 
are accomplished normally. The following variations are allowed upon restart: 

e Variation of device and volume configuration 

e Variation in JCL and data in the resubmitted deck 

You cause a deferred checkpoint restart of a job by the following procedure: 

1. Use the option of coding a special form of the RD parameter (RD=NR) in the 
original job deck. This specifies that, if the CHKPT macro instruction 
executes, a checkpoint entry is written but an automatic checkpoint restart is 
not requested. 

2. Cause execution of the CHKPT macro instruction, which writes checkpoint 
entry. 

3. Resubmit the job whether or not it terminated abnormally. For example, 
resubmit it because a volume of one of its input data sets was in error and had 
caused the corresponding part of an output data set to be in error. 

4. Code the RESTART parameter (RESTART =(stepname,checkid)) on the JOB ie 
statement of the restart deck. The parameter specifies both the step to be Re F 
restarted and the checkid that identifies the checkpoint entry to perform the — 
restart. 

5. Place a SYSCHK DD statement immediately before the first EXEC statement 


in the restart deck. It specifies the checkpoint data set from which the 
specified checkpoint entry is read and is additional to any DD statements in the 
deck that define data sets into which checkpoint entries are written. Figure 12 
on page 43 illustrates a job when it is originally submitted and when it is 
resubmitted for a deferred checkpoint restart. Assume in Figure 12 that 
STEP2, when originally executed, terminates abnormally at some time after 
CH04 is written. Note that, in the resubmitted deck, the programmer requests 
that STEP2 be restarted using the checkpoint entry identified as entry CH04. 
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" sue Original Deck 


//MYJOB JOB RD=NR Requests that 
//* automatic restart not 
//* occur (optional) 


//STEP1 EXEC 


//STEP2 EXEC PGM=MYPROG MYPROG issues CHKPT 
//* macro 

ei 

/ /NAME 1 DD DSN=NAME2 , DISP=(,CATLG, KEEP) 

//* Describes checkpoint 
//* data set 


//STEP3 EXEC 


Resubmitted Deck 


//MYJOB JOB RESTART= (STEP2 , CHO4) Request restart at 
J /* CHO4 in STEP2 

//SYSCHK DD DSN=NAME2 Describes data set 
//* that contains CHO4 


//STEP1 EXEC 


( //STEP2 EXEC PGM=MY PROG 

rd //NAME1 DD DSN=NAME2 , DISP=(,CATLG, KEEP) 
Li* Describes data set 
{= in which new 
//* checkpoint entries 
i oes will be written 


//STEP3 EXEC 


Figure 12. Requesting a Deferred Checkpoint Restart 
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JCL Requirements and Restrictions on Deferred Checkpoint Restart 


Ad 


To perform a deferred checkpoint restart, you must provide the data set a 
environment required by the restart job, using conditional dispositions during the 
original execution. 


Conditional dispositions should be used to: 
e Keep all data sets used by the restart step. 


e Catalog all data sets passed from steps preceding the restart step to steps 
following the restart step. Although the step that terminates abnormally is not 
using the passed data sets, its termination causes the cataloging of the data sets 
if the conditional catalog parameter is used in the preceding steps. 


Notes: 

1. Temporary data sets cannot be kept. 

2. If the DD statement for any data set at checkpoint time had 
DISP=(,DELETE,CATLG), the data set must be uncataloged after completion of 
the restarted job or job step by using IEHPROGM. 

Dynamically deallocated data sets have the disposition processed as specified at the 


time of deallocation. Conditional disposition processing is not performed during 
abend. 


The following rules must be adhered to when resubmitting a job for a deferred 
checkpoint restart: 


\ 
\ 


1. A RESTART parameter with a checkid subparameter must be coded on the 
JOB statement. 


2. ASYSCHK DD statement must be placed in the job deck immediately before 
the first EXEC statement. 


3. The EXEC statements in the job deck must have unique names. (The system 
searches for the named restart step.) 


4. The JCL statements and data in steps preceding or following the restart step 
can be different from their original forms. However, all backward references 
must be resolvable. 


5. The restart step must have a DD statement corresponding to each DD 
statement present in the step in the original deck, and the names of the 
statements must be the same as they were originally. However, the restart step 
can contain, in any position, more DD statements than it contained originally. 
The total number of volumes specified at restart must equal or exceed the 
number specified at the checkpoint. 


6. IfaDD statement in the restart step in the original deck defined a data set that 
was open at the checkpoint to be used, the corresponding statement in the 
restart deck must refer to the same data set, and the data set must be on the 
same volume, and, in general, have the same extents recorded in its DSCB as it 
did originally. (See the exceptions in the note that follows.) If the data set is 





MVS/XA Checkpoint/Restart User’s Guide 











multivolume and is processed by the sequential access method (SAM), only the 
part of the data set on the volume in use at the checkpoint needs to be the 
same as it was originally. 





Note: The extents can differ as follows: 


e Inthe DD statement, you can request that additional space be allocated to 
the data set when the space currently available is exhausted. If space is 
allocated after a checkpoint is taken, this space is indicated in the DSCB; 
after restart from the checkpoint, the space is released, and the DSCB 
contents are changed to what they were at the checkpoint. 


e Inthe DD statement, you can request that unused space be released at the 
end of the job step. If the space is released, the DSCB may indicate a 
reduced extent for the data set when deferred restart at a checkpoint 
occurs; no space is allocated to replace released space. Space is not 
released when step termination is followed by automatic restart. 


JCL-specified data sets that were deallocated prior to the checkpoint are not 
allocated at restart. 


When there is no need to read or modify a data set after restart, the data set 
can be replaced by a dummy data set if the original data set was processed by 
SAM and the job step is not restarted from a checkpoint within the data set’s 
end-of-volume exit routine. A VSAM data set using the ISAM compatibility 
interface cannot be replaced by a dummy data set. Any dummy data set 
aa present at the time of the checkpoint must be present as a dummy data set at 
( y restart. Allocation is done for each DD statement in the job step where the 
_ checkpoint is taken, even if the data set is closed at the time of the checkpoint. 


7. The data in the restart step need not be the same as it was originally. If data 
following a DD * statement was present originally and is entirely omitted in the 
restart deck, the delimiter (/*) statement following the data may also be 
omitted. The delimiter statement following a DD DATA statement may not be 
omitted. 


8. The VOL parameter of a DD statement must refer to at least those volumes it 
refers to at checkpoint time. More volumes may be added if desired. The 
following DD statement parameters may not be changed: DISP, DCB, LABEL, 
and UNIT. | 


9. Except for the requirements stated in rules 4 through 8, the JCL statements 
and data in the restart step can be different from their original forms. In 
particular, the DUMMY parameter can be used for any data that was not open 
at the checkpoint. 


10. If data sets are passed from steps preceding the restart step to steps following 
it, the DD statements receiving the data sets must entirely define them. They 
must explicitly specify volume serial number, device type, data set sequence 
number, and label type, unless this information can be retrieved from the 
catalog. It is recommended that passed data sets be conditionally cataloged 
during abnormal termination of the original execution. Label type cannot be 
retrieved from the catalog. 
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11. The EXEC statement PGM and COND parameters and the DD statement 
VOL=REF parameter must not be used in the restart step or in steps following 
the restart step if they contain values of the form stepname, or 
stepname.procstepname, referring to a step preceding the restart step. 


~ ee 
| 
o 


12. The DD statement VOL=REF parameter is ignored if restart is attempted from 
a checkpoint taken when the DD statements are opened out of sequence and 
the referenced DD statement requested nonspecific tape volumes. 


13. The volume serial number must be coded on a restart DD for a generation data 
group if the deferred checkpoint restart volume serial number list is to be 
different from the original. 


Resource Variations Allowed in a Deferred Checkpoint Restart 


The system’s device and volume configuration can differ from the status during the 
original execution of the job. The allowable differences are those described under 
‘Resource Variations Allowed in Automatic Restart” on page 36. 


System Restrictions for Deferred Checkpoint Restart 


The following restrictions apply when submitting a job for deferred checkpoint 
restart: 


e Jobs specifying nonpageable storage require an area of storage identical to the 


storage originally requested at the time the checkpoint is taken and starting at 
the same address. 


e The link pack area modules in use at the time the checkpoint is taken must 
reside in the same storage locations for the job submitted for deferred 
checkpoint restart. 


e The nucleus must not be changed between checkpoint and restart. 
If the required storage is not available, it may be for one of the following reasons: 
e The link pack area expands into the required storage. 


e The system queue area expands into the required storage. If either of the 
above conditions occurs, contact your system programmer for a respecification 
of the system parameter to modify the area size and repeat initial program 
loading using the new value. 


How the System Works during a Deferred Checkpoint Restart 


After the system reads and interprets the restart deck, it reads the specified 
checkpoint entry and merges information from it into the scheduler work area entry 
for the job. As a result, the work queue entry differs from the entry existing during 
the original execution, as described earlier. (See ““How the Job Deck Is 
Reinterpreted and the SWA Merged” on page 37.) 
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Next, the system initiates the restart step normally. The system reads the specified 
checkpoint entry again and functions as in automatic restart. Restart is delayed 
until the required virtual-storage area is available. 





When a job restarts correctly, you receive two messages: JHJOO6I and IHJOOS8I. If 
these messages do not appear for nonpageable storage jobs, enter a DISPLAY A 
command to see if a system task or another job is using the required storage. You 
can then stop or cancel the conflicting job. 


If you have multiple tape or disk volumes, the system may ask the operator to 
mount data volumes other than those required at the beginning of the job. In 
addition, any card input data sets that are used by the failing job step must again be 
made available to the system. 


Coding the RD and RESTART Parameters 


RD (Restart Definition) Parameter 


The RD parameter is coded in the JOB or EXEC statement to request automatic 
step restart if failure occurs, and/or to suppress, partially or totally, the action of 
the CHKPT macro instruction. If the RD parameter requests an automatic step 
restart if failure occurs, or if the RD parameter is not coded, the action of CHKPT 
os is normal. (CHKPT writes a checkpoint entry and requests a checkpoint restart to 
( : be performed if failure occurs.) The RD parameter is ignored for system tasks and 
one generalized start jobs. 


The RD parameter, coded on an EXEC statement, applies to the step 
corresponding to the statement or to all steps of the cataloged procedure referred 
to by the statement. Coded on a JOB statement, the RD parameter applies to all 
steps of the corresponding job and overrides an RD parameter coded in any EXEC 
statement of the job. The syntax of the parameter is: 


RD[.procstepname]={R | NC | NR | RNC} 





The possible definitions are: 


RD=R 
(Restart) Requests an automatic step restart if failure occurs. If the CHKPT 
macro instruction is executed in the step, the resulting request for an 
automatic checkpoint restart overrides the request for an automatic step 
restart. This parameter is ignored if the job does not contain a job journal. 
For JES2 and JES3, this parameter forces job journaling. 


RD=NC 
(No checkpoint) Does not request an automatic step restart. It totally 
suppresses the action of the CHKPT macro instruction if the macro 
instruction executes in the step. This allows use of a program containing 
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CHKPT when the checkpoint function is not wanted. It can also suppress a 
the checkpoint at end-of-volume. 7 


RD=NR 
(No automatic restart) Does not request an automatic step restart. It 
suppresses the request for an automatic checkpoint restart that is made when 
the CHKPT macro instruction executes in the step. If CHKPT executes, it 
writes a checkpoint entry normally. A deferred restart can be performed 
from the checkpoint entry. 


RD=RNC 
(Restart and no checkpoint) Requests an automatic step restart if failure 
occurs. It totally suppresses the action of CHKPT if CHKPT executes in the 
step. It can suppress the checkpoint at end-of-volume. If the job does not 
contain a job journal, the step is ineligible for automatic restart. For JES2 
and JES3, this parameter forces job journaling. 


If RD=value is coded on an EXEC statement that invokes a cataloged procedure, 
the parameter applies to all steps of the procedure and overrides all RD parameters 
present in the EXEC statements of the procedure. RD.procstepname=value can be 
coded instead of RD=value; it applies to the specified procedure step and overrides 
the RD parameter that may be coded on the EXEC statement of the procedure 
step. RD.procstepname=value can be coded once for each step of the procedure. 


RESTART Parameter 


The RESTART parameter requests a deferred restart of a job. It is coded in the ( 
JOB statement when the job is resubmitted. If step restart is to occur, this \ 
parameter specifies which step to begin. If the restart is to occur at a checkpoint 

that was taken during a step, both the step and the identification of the particular 

checkpoint entry are specified. The syntax of the parameter is: 


RESTART=({stepname | stepname.procstepname | *}[,checkid]) 





Both operands are used if restart at a checkpoint is to occur. If a step restart is to 
occur, checkid must be omitted; the enclosing parentheses may be omitted. 


stepname 
is coded as stepname.procstepname if a step of a cataloged procedure is to be 


restarted. The parameter can be coded as * if the first step of the job 
(possibly a step of a cataloged procedure) is to be restarted. 


checkid 
can contain up to 16 characters in any combination of alphameric characters, 


printable special characters, and blanks. If it contains any special characters 
or blanks, it must be enclosed in single apostrophes, and apostrophes within 
it must be represented as double apostrophes. 
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SYSCHK DD Statement 


IO NEI A TE TE LIT EPP INR TI AT TT EIT 


The SYSCHK DD statement must be included in the resubmitted job to perform a 
deferred checkpoint restart. This DD statement specifies the checkpoint data set 
that contains the checkpoint entry to be used in the restart. The SYSCHK DD 
statement may not be included when a deferred step restart is to be performed. 


The statement must precede the first EXEC statement in the deck that performs a 
deferred restart at checkpoint. It must follow the JOBLIB DD statement if the 
JOBLIB DD is present. The desired checkpoint entry must be named by the 
checkid subparameter of the JOB statement RESTART parameter. 


The following requirements and restrictions apply to the SYSCHK DD statement: 
e The statement must contain or imply DISP=(OLD, KEEP). 


e The statement must define the checkpoint data set. It must specify its name, 
device type, and volume serial number. The catalog may be used, eliminating 
the need for device type and volume serial number. 


e If the volume containing the checkpoint data set is mounted on a 
JES3-managed device, the SYSCHK DD statement must not request deferred 
mounting. 


e The SYSCHK data set cannot be multivolume or concatenated. If the 
checkpoint data set is multivolume, the SYSCHK DD statement must specify, 
as the first volume of the data set, the volume and data set name that contain 
the desired checkpoint entry. The serial number of the volume containing a 
particular entry appears in the console message that is written when the entry is 
written. A checkpoint data set may not be a concatenated data set. 


e If the checkpoint data set is partitioned, the DSNAME parameter on the 
SYSCHK DD statement must not contain a member name. 


e IfaRESTART parameter without the checkid subparameter is included in a 
job, a SYSCHK DD statement must not appear before the first EXEC 
statement of the job. 


e If aRESTART parameter is not included in a job, a SYSCHK DD statement 
appearing before the first EXEC statement in the job is ignored. 


e ASYSCHK DD statement appearing in a step or procedure step of a job is 
treated as an ordinary DD statement; that is, the name SYSCHK has no special 


meaning in that case. 


An example of a SYSCHK DD statement is: 


//SYSCHK DD DSN=dsname,DISP=OLD,UNIT=name, 
S, VOL=SER=volser 
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Chapter 4. Checkpoint/Restart Processing 


This chapter describes how the checkpoint/restart routine handles different types 
of user data sets, and gives additional guidelines on repositioning and preserving 
the contents of data sets. There are also some notes at the end of the chapter on 
using checkpoint/restart with various programs and programming languages. 


Notes for User Data Sets 


The data set types discussed are listed below: 
e BDAM data sets 

e Generation data sets 

e ISAM data sets 

e MSS data sets 

e Partitioned data sets 

e SAM data sets 

e SYSABEND data sets 
e SYSIN data sets 

e SYSOUT data sets 

e TCAM data sets 

e Temporary data sets 

e VSAM data sets 


e Work data sets 


Chapter 4. Checkpoint/Restart Processing 51 




















BDAM Data Sets 


Generation Data Sets 


When the basic direct access method (BDAM) is used to process a data set, 
processing resumes normally upon restart. However, you must ensure that a 
particular block is read or written before a checkpoint is taken. Your program 
must complete the BDAM I/O operation by executing the CHECK or WAIT 
macro before it executes the CHKPT macro. 


Any change made to a record between a checkpoint and a restart will remain in the 
data set. You are responsible for backing out any changes that would otherwise 
produce invalid results. For more information, see “‘Preserving Data Set Contents” 
on page 63. If the program does not complete the operation, the block may be 
read or written either before or after the checkpoint is taken. 


For serially reusable resources, your program must also issue either the WRITE or 
RELEX macro to release a record that is read with exclusive control, before 
executing the CHKPT macro. 


No automatic cataloging of generation data sets takes place. If certain generation 
data sets of a generation data group are to be cataloged, you must catalog them. 
The order in which they are cataloged determines the relative generation numbers 
of the generation data sets for reference by later jobs. The last generation data set 
becomes the 0 generation, the next-to-last cataloged generation data set becomes 
the -1 generation, and so on. 


Generation data sets created by one step of a job may be passed to subsequent 
steps in the same job and may be referenced by the relative generation numbers 
assigned at the time of creation, whether or not the generations were cataloged. 


For deferred step restart, the generation data group name table (GDGNT) is 
recreated from the catalog. The last generation data set cataloged prior to 
termination of the job becomes the 0 generation and is used for the base name in 
the GDGNT. This may not be the same as the base name when the job was 
initially run; you must know which generation data sets were cataloged and in what 
order the data sets were cataloged, and the JCL must be modified accordingly. 


For a deferred checkpoint restart, no modification of the relative generation 
number in the JCL is necessary. 


When a job is started, the base name (0 generation name) of each generation data 
group is placed in the GDGNT. The GDGNT is never changed unless a new 
generation level is cataloged. It is saved at a checkpoint and is available for both 
automatic and deferred checkpoint restart. The restart takes place without any 
change in the JCL, whether or not generations previously created by the job were 
cataloged. 


When using Generation Data Groups, the disposition of the data set should be 
specified correctly to avoid having to change the JCL at restart time. If the 
DISP=(NEW,CATLG,CATLG) was specified, then the Generation Data Group 
number is set to (0) at the time of restart and the checkpointed data set is now at 
GDG(-1), regardless of how the job completed. In order for generation data sets 
to maintain their respective generation numbers at checkpoint and restart time, the 
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disposition should specify DISP=(NEW,CATLG,KEEP). This will prevent the 
generation level from changing if the job step fails. If, however, you wish to restart 
a job that completed without failing, you must change the JCL of the job to be 
resubmitted to get the correct generation levels. 


You must take precautions when using the generation data set as the checkpoint 
data set. For a deferred checkpoint restart, the checkpoint data set, as specified in 
the SYSCHK DD statement, is allocated to the initiator, and an entry is made in 
the initiator’s GDGNT. The GDGNT is never updated and remains for the 
duration of the job and for any future restarts under the same initiator. The 
GDGNT uses a data set from the same generation data group and uses the original 
contents of the GDGNT to obtain the qualified name of the generation data set. 


After entry into the GDGNT, if the levels of the generation data group change in 
the catalog and if the checkpoints are of a different generation, the desired 
generation at restart time may not be the one retrieved as the checkpoint data set. 
To avoid this problem, do not make changes to the number of existing generations 
for the duration of the IPL after the checkpoint data set is used for deferred restart. 
Be sure the same initiator is not used twice to do restarts from different levels of 
the same generation data group. 


Note: A job that contains generation data sets referred to by a relative generation 
number of +1 or greater and with a disposition of OLD, SHR, or MOD is failed by 
the JES3 interpreter service unless the UNIT parameter is included in the DD 
statement. With UNIT specified, the JES3 interpreter service permits this data set 
to be allocated on a deferred basis. 


A checkpoint should not be taken before an ISAM data set is opened in load mode. 
A checkpoint should be taken immediately after the data set is opened. Otherwise, 
an abend results from a restart at a previous checkpoint. 


A restart should not be attempted from checkpoints taken during loading of an 
ISAM data set using QISAM in load mode, if insertions were made on the ISAM 
data set after it was loaded, and if the insertions were made using the WRITE KN 
macro. 


An ISAM data set that is shared must be closed before taking a checkpoint. Note 
that, if an ISAM data set is closed immediately after restarting the program at a 
checkpoint, the data set may not be restored to its original condition. 


Any change made to a record between a checkpoint and a restart will remain in the 
data set. You are responsible for backing out any changes that would otherwise 
produce invalid results. For more information, see ‘Preserving Data Set Contents’”’ 
on page 63. 
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MSS (Mass Storage System) Data Sets 


Restart delays one second for each MSS volume that must be mounted, plus the 
time required to stage cylinder zero, the VTOC, and each data set. 


Partitioned Data Sets 


If a partitioned data set is compressed after a checkpoint in which the partitioned 
data set was open, restart at checkpoint should not be requested. 


If a checkpoint is taken and a partitioned data set is opened, another checkpoint 
should be taken before any records are written into the data set. If the second 
checkpoint is not taken and restart occurs at the first checkpoint, the OPEN 
routine positions to the current end of the data set instead of to the original end. 


Adding Members 


When a partitioned data set is updated, be careful to preserve the contents of the 
directory. The directory consists of entries that point to each member of the data 
set. 


When a member is added to a partitioned data set, an entry is also added to the 
directory. If one member is added, the STOW macro may be used to create the 
entry, or the member name may be specified in the DD statement. In the latter 
case, the control program creates the directory entry when the data set is closed or 
when the job step terminates. If more than one member is added, the STOW 
macro must be used to create an entry for each member, and a new checkpoint 
should be taken after each use of the STOW macro. 


When one or more members are added to a partitioned data set, a checkpoint 
should be taken immediately after opening the data set. After taking the 
checkpoint, the new member may be written and its entry added to the directory. 
If the step is restarted from the checkpoint, the data set is then repositioned; the 
new member and its directory entry are deleted and are re-created after restart. 


Updating Members 


To update a member of a partitioned data set, updated records may either be 
written back to their original locations, or the entire member (in updated form) 
may be rewritten as a new member of the data set. In the latter case, the directory 
entry must be updated to point to the rewritten member. 


If a checkpoint is taken before rewriting an entire member, one must also be taken 
immediately after updating the directory, because the control program deletes the 
updated directory entry if it repositions the data set for restart from the original 
checkpoint. Because no entry then points to the original member, the postrestart 
processing will be unsuccessful. 
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Deleting Members 





Members may be deleted from a partitioned data set during a restart. Because this 
action may delete members written by another job (another job may have been 
executed between the original and restart executions of the subject job), restart at a 
checkpoint should not be requested. 


If a partitioned data set is open for output at checkpoint time, any new members 
added later will be deleted during a restart from this checkpoint. No members will 
be deleted when a partitioned data set is open for update. You are expected to be 
cautious when restarting from checkpoints having partitioned data sets open for 
update. 


SAM Data Sets 


When BSAM or QSAM is used to read a data set from a card reader, your program 
can reposition the data set upon restart. If you provide a repositioning routine, 
instruct the operator to position the data set to the beginning if a restart becomes 
necessary. The program might be designed to operate as follows: 


e The program saves the first record read from the data set and keeps a count of 
the number of records read before each checkpoint. 


e After a restart, the repositioning routine reads a record from the data set and 
compares it with the first record read before abnormal termination. If the 
pm records are identical, the data set has been positioned to the beginning. The 
( | routine then repositions it by reading (without otherwise processing) the 
: number of records read before the checkpoint. 


SYSABEND Data Sets 


Whether or not checkpoint/restart is used, abnormal termination causes the system 
to write a SYSABEND (or SYSUDUMP) data set if you provide a SYSABEND (or 
SYSUDUMP) DD statement. The system uses its own data control block to write 
the data set, and it opens the data set during abnormal termination processing. 

You may code or omit the SYSOUT parameter on the SYSABEND DD statement. 


When the SYSOUT parameter is coded and automatic restart occurs after 
abnormal termination, the SYSABEND or SYSUDUMP data set is not printed for 
step restart. Because the SYSABEND or SYSUDUMP data set is created by the 
job step, it is deleted during restart. 


In all other cases, the SYSABEND or SYSUDUMP data set is printed, whether or 
not.the restart is successful. If a second abnormal termination occurs, a second 
SYSABEND or SYSUDUMP data set is written. The same rules that apply to the 
first data sets also apply to the second data set and to all subsequent data sets. 
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SYSIN Data Sets 


SYSOUT Data Sets 


When restart at a checkpoint occurs, a SYSIN data set (data following a DD * or 
DD DATA statement) is repositioned. Unit-record data sets are never 
repositioned. (The checkpoint routine waits until all requested input/output 
operations are complete, then requests that the job entry subsystem save 
positioning information.) When automatic restart occurs, the system keeps the 
direct access data sets that contain the SYSIN data of the job restarting. During 
the restart execution, the job can read data from the direct access data sets as it 
could during the original execution. 


To perform deferred restart, you include any necessary SYSIN data in the 
resubmitted deck. If the restart is to be at a checkpoint, and a SYSIN data set is 
open and not completely read at the checkpoint to be used, the attributes of the 
direct access data set (into which the system will write the SYSIN data) must be the 
same as the attributes of the direct access data set used originally. (The location 
and number of extents in the data set used during restart need not be the same.) 


Information about altering SYSIN data in a restart deck is given under “JCL 
Requirements and Restrictions on Automatic Restart’”’ on page 36. 


The following describes how SYSOUT data sets (data sets having the SYSOUT 
parameter coded on their DD statements) are handled during various types of 


restart. ip 


Automatic Restart: Your program writes SYSOUT data into one or more direct 
access data sets. If step restart is occurring, the direct access data sets used during 
the original execution are deleted. New direct access data sets are allocated when 
the step restarts. 


If a checkpoint restart occurs, the data sets used during the original execution are 
kept. If aSYSOUT data set was open when the last checkpoint was taken, it is 
repositioned to its position at the time the checkpoint was taken. 


The checkpoint routine waits until all requested input/output operations are 
complete, then requests that the job entry subsystem save positioning information. 
Data written during the restart execution overlays only the data written between 
the time the last checkpoint was taken and the time the job step terminated 
abnormally. If a SYSOUT data set is closed at the checkpoint, the data set is not 
repositioned. If the restart step opens the same data set again, the data written 
during the restart follows the data originally written. (The data set has an implied 
disposition of MOD.) 


Deferred Checkpoint Restart: When a checkpoint restart occurs, and a SYSOUT 
data set is open at the checkpoint, the data set written into during the restart is a 
new data set different from the data set used originally. 
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TCAM Data Sets 


Temporary Data Sets 


VSAM Data Sets 


General Information 


A successful restart of a telecommunications access method (TCAM) data set 
depends on the following conditions: 


e The message control program (MCP) region must be active and have enough 
virtual storage to build the required control blocks. 


e The QNAME parameter in the DD statement of the checkpoint job must be 
available in the terminal table of the MCP region. 


Direct-access space for temporary data sets can be preallocated to save time in 
scheduling job steps but cannot be used with checkpoint/restart. Checkpoints and 
automatic restarts are suppressed for any job step that uses a preallocated 
temporary data set. 


For VSAM data sets, you are responsible for handling checkpoint/restart problems 
that arise because of changes in the data. For example, consider a program that 
updates records in a data set by adding a number to a value already existing in 
some field within the record. If the program terminates and is restarted, you must 
ensure that the records processed between the checkpoint and the termination are 
not processed again after the restart. 


During checkpoint restart processing, no user ACB exits are taken. For checkpoint 
processing, the AMB exception exit is taken; for restart processing, the AMB 
exception exit is not taken. 


The checkpoint program issues a VSAM temporary CLOSE macro instruction to 
update the catalog. It records information about VSAM data sets in a checkpoint 
data set. If a failure occurs, the latest checkpoint record is used to reconstruct the 
situation that prevailed when the checkpoint was taken. 


When the checkpoint routine records positioning for a VSAM data set, all 
outstanding I/O requests for the data and index are completed before the contents 
of your address space are saved. If an error occurs while these I/O requests are 
processed, the checkpoint procedure stops and a code of X'0C' is returned in 
register 15. You may handle the error condition and reissue the CHKPT macro. 


A checkpoint cannot be taken if VSAM data sets are open for regions that are 
using the control blocks in common (CBIC) option. 
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The AMP Parameter 


The AMP parameter has a subparameter for specifying checkpoint/restart options 
that handle two special situations in restarting a processing program: 


e Modifications to the data set other than records added sequentially to the end 
of an entry-sequenced data set. The restart program cannot restore a data set 
to its checkpoint status if there have been internal modifications to it since the 
checkpoint, so the restart program does not attempt restart processing. 


e Addition of records to the end of a data set by way of a job step other than the 
job step that issued the checkpoint. Any records added to the end of an 
entry-sequenced data set are erased in restoring the data set to its checkpoint 
status. 


The AMP options for checkpoint/restart are: 

e Let restart take its normal action for either situation. 
e Override either one or the other of the two actions. 
e Override both. 


If you override the check for internal modification, the processing program is 
restarted, even though the data set being processed cannot be restored. If you 
override erasing data at the end of a data set and the catalog has been updated, the 
processing program is not restarted unless you also override the check for 
modification. For more detail on multiple ACBs open against the same data set, 
see “ACB Macro” in VSAM Administration: Macro Instruction Reference. 


To prevent data from being erased or to allow restart with modified data sets, the 
AMP subparameter must be coded in the DD statement for the cluster or data set. 


All multiple ACBs open for output connected to the same control block structure 
must have identical checkpoint restart AMP CROPS options. If this is not done, 
the results are unpredictable. 


Repositioning is mandatory for all VSAM data sets open for create mode 
processing, except for relative record data sets processed in direct mode. If 
AMP=‘CROPS=NRP’ or AMP=‘CROPS=NRC’, no checkpoint is taken. If a 
checkpoint is attempted, message IHJOOOI is issued, and a return code of X'08" is 
returned in register 15. A reason code of 41 in register O is also returned to you. 


If a VSAM data set supported for repositioning extends to a new volume after the 
checkpoint, VSAM restart cannot reposition the data set. A restart from that 
checkpoint is not successful unless the no-reposition option is taken by specifying 
AMP=‘CROPS=NRP’ or AMP=‘CROPS=NRC’. 
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VSAM Data Set Types 


Key-Sequenced Data Sets 


Entry-Sequenced Data Sets 


VSAM has key-sequenced, entry-sequenced, and relative record data sets. The 
main difference among the three is the order in which the data records are loaded 
into the data sets. 


Records are loaded into a key-sequenced data set (KSDS) in key sequence, the order 
defined by the collating sequence of the contents of the key field in each of the 
records. Each record has a unique value in the key field, such as employee number 
or invoice number. VSAM uses an index and optional free space to insert a new 
record into the data set in key sequence. 


Records are loaded into an entry-sequenced data set regardless of the contents of the 
records. Their sequence is determined by the order in which they are physically 
arranged in the data set or their entry sequence. New records are stored at the end 
of the data set. 


Records are loaded into a relative record data set in relative record number 
sequence. The data set is a string of fixed-length slots, identified by a relative 
record number. When a record is inserted, you can assign the relative record 
number or allow VSAM to assign the record the next available number in 
sequence. No index is used. 


A key-sequenced data set is prepared for restart by the restoration of any statistical 
information (such as number of records inserted) to its checkpoint status. 


For a VSAM key-sequenced data set, restart does not erase any data except in 
create mode. It does, however, detect modification of the data set by either the 
checkpointed program or another program that used the data set between the 
checkpoint and the restart. If the data set is modified, the restart is terminated, 
unless you override the testing of the data set by using the AMP=‘CROPS=NCK’ 
subparameter in the DD statement for the data set. 


A checkpoint may not be taken if a VSAM entry-sequenced data set is open for 
output with an immediate-upgrade path (or alternate index) open over it, unless the 
no-reposition option, AMP=‘CROPS=NRE’ or AMP=‘CROPS=NRC’, is 
specified. VSAM immediate-upgrade data sets are key-sequenced data sets, and 
repositioning is not supported for them. 


For a VSAM entry-sequenced output data set, all data added after the last 
checkpoint was taken is physically erased unless the AMP=‘CROPS=NRP’ or 
NRC subparameter is specified in the DD statement for the data set. If data is 
erased, the catalog record for the data set is updated to reflect the current end of 
data, and the data-set statistics are adjusted to reflect the new status. 


During restart, entry-sequenced output data sets are restored by the elimination of 
all records added at the end since the checkpoint. 
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Relative Record Data Sets 


Work Data Sets 


A checkpoint may not be taken for a VSAM relative record data set in load mode if 
direct processing was or is performed. 


If a checkpoint is taken with a relative record data set open for load mode 
nondirect processing and direct processing is performed after the checkpoint (thus 
affecting data that was loaded before the checkpoint was taken), no attempt is 
made by restart to reset the data that existed at checkpoint time. After restart, the 
data set resets to checkpoint status but still contains the results of any direct 
processing performed on that part of the data set that existed at checkpoint time. 


Many programs use “‘work”’ data sets, which are alternately written and read, 
rewritten and reread. If a work data set is used, a checkpoint should be taken each 
time you have finished reading the data set and before rewriting it. 


For example, a program may perform the following sequence of operations to 
produce different versions of data set A: 


1. Write and then read back A1. 

2. Write and then read back A2. 

3. Write and then read back A3. 

A checkpoint should be taken at the beginning of operations 2 and 3 before any 
rewriting of data set A takes place. If, for example, the job step is abnormally 
terminated while operation 2 is in progress, the job step can be restarted from the 


checkpoint taken at the beginning of operation 2. At this checkpoint, there is no 
need for the data in version A1. 


Notes for Tape Labels and Files 


DOS Tape Files 


The tape files and labels discussed are listed below: 
e DOS tape files 
e ISO/ANSI tape labels 


e Nonstandard tape labels 


Checkpoints may be taken with DOS tape files opened with the bypass leading 
tapemark option LABEL=(,LTM) and/or the bypass embedded DOS checkpoint 
records option DCB=(OPTCD=H) specified. However, a checkpoint must not be 
taken when an opened data set resides on a DOS 7-track tape, is written in 
translate mode, and contains embedded checkpoint records. 
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ISO/ANSI Tape Labels 


An ISO/ANSI tape data set may not be used as a checkpoint data set. 


If an ISO/ANSI tape data set is open when a checkpoint is requested, the 
checkpoint is refused. Message IHJOOOI is issued and a return code of X'08' is 
returned in register 15, and a reason code of 102 is returned in register 0. 


Nonstandard Tape Labels 


You must provide a routine to process nonstandard labels at restart time. This 
routine must perform input header label processing, because output tapes contain 
the header labels that were written when the data sets were opened (prior to 
checkpoint). 


At restart time, the control program checks the tape to make sure that the first 
record is not a nonstandard volume label. If the first record is 80 bytes in length 
and contains the identifier VOL1 in the first 4 bytes, the label is not an IBM 
standard label and the tape is not accepted. Restart issues a message directing the 
operator to mount the correct tape. 


If the tape does not contain an IBM standard volume label, restart’s routine gives 
control to the user’s routine for processing nonstandard labels. When this routine 
receives control, the tape is positioned at the interrecord gap preceding the 
nonstandard label (the tape has been rewound). 


If your routine determines that the wrong volume is mounted, a 1 must be placed in 
the high-order bit position of the SRTEDMCT field of the unit control block 
(UCB), and control is returned to restart. The control program issues a message 
directing the operator to mount the correct volume. When the new volume is 
mounted, restart repeats the above steps. 


Before returning control to restart, your routine must position the tape at the 
interrecord gap that precedes the initial record of the appropriate data set. This 
applies to both forward and backward read operations. The control program then 
uses the block count shown in the data control block to reposition the tape at the 
appropriate record within the data set. This positioning is always performed in a 
forward direction. If the block count is zero or a negative number, the control 
program does no positioning. (If you want the control program to reposition the 
tape during a restart, normal header label routines (OPEN and EOV) must 
properly initialize the block count field of the data control block during the original 
creation. The block count field of the data control block must not be altered at 
restart time.) 


If you are using a multivolume tape data set that is described by concatenated DD 
statements with the BLP (bypass label processing) option, the deferred option 
should be coded in the UNIT parameter of those DD statements. This prevents a 
misleading scheduler premount message that asks for the wrong volume to be 
mounted at restart time. 


For additional information about tape labels, see Magnetic Tape Labels and File 
Structure Administration. 
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Repositioning and Preserving the Contents of Data Sets 





The control program repositions data sets, but does not preserve their contents. 
After taking a checkpoint, you must ensure that data set contents are not changed 
in such a manner as to make successful postrestart processing impossible. 


Recording Data Set Positioning 


The checkpoint routine records positioning information for user data sets as 
follows: 


e Data sets are repositioned at restart only if they were open when the 
checkpoint was taken. The OPEN routine positions all data sets opened after 
the checkpoint was taken. 


e Unit-record data sets (printer, punch, or card reader) are not repositioned at 
restart. | 


e If you use EXCP to process a tape data set open at a checkpoint, ensure that 
the block count in the data set’s data control block is correct. If the block 
count is incorrect, the data set may be incorrectly positioned by restart. 


If input/output operations were requested, but not begun (for example, if a READ 
macro instruction was executed, but the related channel program was not started), | 
the checkpoint routine then stops any processing associated with the I/O request, = 8 | 
records the positioning information, and reestablishes I/O operations. 





If I/O operations have already begun, the checkpoint routine waits until they are 
complete before recording positioning information. 


Buffering of Data Set Records 


When QSAM or QISAM is used to process a data set, an indeterminate number of 
virtual-storage buffers may contain data when a checkpoint is taken. If restart at a 
checkpoint occurs, the system’s action depends on whether a card reader or 
another type of device is used to process the data set: 


¢« Card reader used (QSAM only). Upon restart, existing buffer contents are 
released. The buffers are reprimed by reading records from the current data 
set into them. 


e Another device used. Upon restart, the buffer contents are restored to virtual 
storage, and processing continues normally. It is not possible to predict the 
time (either before or after the checkpoint) at which a given record will be 
transferred between a buffer and the recording medium. 


If you close a QSAM or QISAM data set immediately after restarting the program 
at a checkpoint, be aware that the data set may not be restored to the same 
condition it was in when the checkpoint was originally taken. cf 
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Preserving Data Set Contents 


The system does not save and restore the contents of data sets. You, the user, 
must ensure that input data sets and system data sets contain all necessary data 
when restart occurs. If a data set on a direct access volume is open at the 
checkpoint, the data set’s label (the DSCB in the VTOC) must have the same 
location and reflect the same extents upon restart as it did when the checkpoint was 
taken. (See “JCL Requirements and Restrictions on Automatic Restart” on 

page 36.) 


If your program reads records from a data set, updates them, and writes them back 
to their original locations, it may be useless to take a checkpoint before completing 
this processing. If a checkpoint is taken earlier, postrestart processing is 
unsuccessful under these circumstances: 


e Your program updates a record before abnormal termination and repeats the 
update after restart, and 


e The updated record contents depend on the original contents. 


For example, suppose that the purpose of the update is to exchange the positions of 
two fields in each record. If the record is updated twice, the fields are returned to 
their original positions, and the results are invalid. In a different application, an 
update might place a value in a record field, regardless of the field’s original 
contents. You could then restart the step at a checkpoint taken before or during 
the update procedure, because an updated record would not be changed if updated 
again after restart. 


When data set records are processed in an update-in-place manner (records are 
read, changed, and written back into their original location in the data set), bad 
data can be prevented if records updated after the last checkpoint are restored to 
their original state or if your program keeps track of the records that are updated 
and avoids updating them again during restart. 


Allocating Devices during Checkpoint /Restart 


When a job step is restarted from a checkpoint, the type of device allocated for the 
data set depends on the specification in the UNIT parameter of the DD statement. 
In addition to assuring the same device type for a checkpoint and restart, the 
system also attempts to allocate a device with the same optional features present at 
the time the checkpoint was taken. 


e If adevice address was specified (for example, UNIT=190), then a device of 
that type is allocated. It may or may not be the device requested. 


e If adevice name was specified, then a device of that type is allocated. 


e If auser-defined name for a single type of device was specified (for example, 
UNIT=DISK1), then a device of the defined type is allocated. 
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e If aname for a mixed group of devices was specified (for example, 
UNIT=SYSDA), then a device of the same type as that used when the 
checkpoint was taken is allocated. 


However, if the mixed group includes devices with varying optional characteristics 
(3340 with and without RPS or DASD shared and not shared between processors), 
a device with the same optional characteristics is not guaranteed. To avoid this, 
define some generics at system generation time that include only a single group 
with the same optional characteristics. 


In jobs that can restart, you should avoid using generic unitnames or esoteric 
unitnames (for example, UNIT=TAPE) that contain more than one device type. 
Allocation failure may result during restart if too few units of a specific device type 
are available. For example, if UNIT=TAPE includes more than one type of an 
IBM 3400 Tape Drive (that is 3400-3,3400-4, and 3400-6), allocation failure may 
occur if the device type available at restart time is not the same device type as 
allocated in the original run of the job. To avoid this allocation failure, define some 
esoteric unitnames at sysgen time that include only a single device type. 


Handling Checkpoint/Restart Errors 


Input/Output Errors 


The checkpoint routine issues return code OC if it encounters a permanent I/O 
error when: 


¢ Completing an outstanding VSAM I/O request 
¢ Quiescing queued access method I/O operations 


— Anexception occurs when QSAM is used and the skip or accept option is 
specified in the EROPT operand of the data set’s data control block. In 
this case, code 00 is returned. 


e Writing the checkpoint data set 


When an access method other than QSAM or QISAM is used, your program can 
ensure that input/output operations are complete before it executes the CHKPT 
macro instruction; it can thereby avoid having read or written an erroneous record 
while quiescing. 


If a permanent error occurs when the system reads a checkpoint data set to 
perform a restart, the restart step is terminated abnormally with the system 
completion code 13F. Further automatic restart of the step is not attempted. 
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On return to the caller, register 15 will contain a return code, and register 0 will 
contain a reason code describing the error. For more information, see 
Appendix A, ‘‘Checkpoint/Restart Codes” on page 71. 


Depending on the type of error, you may want to consult a SYS1.LOGREC or a 
SYS1.DUMPxx data set. 


Checkpoint will make a number of checks to discover user errors. If errors are 
discovered before the first record of the checkpoint entry is written, the checkpoint 
request will be refused; if errors are discovered after the first record of the 
checkpoint entry is written, the checkpoint entry will be invalid. 
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Using Checkpoint/Restart with Other Programs 


Job and Job Step Accounting 


The system accumulates processor time used for each job step and job. To access 
these time values, an installation can provide an accounting routine to receive 
control at step initiation, step termination, and job termination. Accounting 
routines are discussed in detail in Supervisor Services and Macro Instructions. The 
relationships between checkpoint/restart and the step time and job time values 
available to the accounting routine are listed below: 


At termination (either normal or abnormal) of an original execution, the step 
and job times accumulated are available to the accounting routine. 


At initiation of the restart step during an automatic restart, the step and job 
times accumulated for the original execution are again available to the 
accounting routine. 


At initiation of the restart step during a deferred restart, the step and job times 
are Zero. 


At termination of a restart step and at all subsequent times when the 
accounting routine is given control during the restart execution, the step and 
job times reflect only the time used during the restart execution. 


If the TCBUSER field is to be used as a pointer to accounting information, the 
field is restored at restart time to its value at checkpoint time. 


For example, in an original execution, Step A uses 2 minutes of processor time, and 
Step B uses 3 minutes of processor time and abnormally terminates. At step 
termination, the step time is 3 minutes and the job time is 5. 


If automatic restart is performed for Step B, a step time of 3 minutes and a job 
time of 5 are again available to the accounting routine at the reinitiation of 
Step B. 


If Step B then uses 4 minutes of processor time and terminates, a step time of 4 
and a job time of 4 are available to the accounting routine at step termination. 


The two values available at the time the restart step is initiated are provided for 
information purposes only. They are not reflected in the step and job running 
times presented at termination time of the restarted job. You need not be charged 
twice for the time accumulated up to the abend. 


Another point to be considered in your accounting routine is the effect of a restart 
on the step sequence number available to the accounting routine. The following list 
indicates the sequence number presented to the accounting routine under the 
various restart conditions: 
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Whenever an automatic restart is performed, the step sequence value accurately 
reflects the position of the step in the job. 







In the case of a deferred restart, the restarting step is the first step of the restart 
job. 


Job Step Time Limit 


The EXEC statement TIME parameter specifies a limit on the processor time used 
by the related step. With any kind of restart, the entire value of the limit specified 
for the job step applies to the restart step. For a deferred restart, you may specify 
a limit different from the limit originally specified. | 


If the processor time used by a step exceeds the specified limit while a checkpoint 
entry is written, the entry is invalid and an abnormal termination occurs. A 
( * preceding checkpoint entry performs a deferred restart. If it is a deferred restart 
Be and if a sufficient number of checkpoints are taken during the restart execution, the 
invalid checkpoint entry is overwritten by a valid entry. 


Completion of Step or Job Termination at System Restart 


If a step or a job is terminating when system failure occurs, the termination is 
completed during the system restart that the operator may perform after the failure. 
This occurs whether or not the step or job uses checkpoint/restart, although the 
job journal option must be in use. 


If other than the last step of a job is terminating when the failure occurs, the 
termination completes during the system restart, and the next step of the job 
initiates. 

If the last step of a job terminates, or, if the job terminates, all necessary 


terminations are completed. 


If a job requests an automatic restart, then abnormally terminates and system 
failure occurs before the restart termination processing is complete, the processing 
completes during the system restart. 
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Sort/Merge 


If a job cannot complete executing because of system failure and the job requested 


job journaling, the job is restarted (warm started) by the system. If the job is 
eligible for automatic restart, the operator is sent message IEF225D asking if the 
job should be restarted. If the job is not eligible for automatic restart or if the 
operator indicated that restart should not be attempted, any scratch or VIO data 
sets the job allocated are deleted and the job is processed based upon the 
FAILURE option specified in the MAIN JCL statement. If 
FAILURE=RESTART is specified, JES3 automatically reschedules the job for 
execution from the beginning of the first step. 


Checkpoint/restart is not available to jobs executing under an ASP main processor. 
(For information about ASP main processors running under JES3, see JES3 
Introduction.) However, JOBSTEP=CHKPNT can be specified on the CLASS 
initialization statement or the MAIN JCL statement to provide a checkpoint of the 
SYSOUT output at the end of each job step. (See Initialization and Tuning Guide 
for information about the CLASS statement and JCL for information about the 
MAIN statement.) You can see the output through the last completely executed 
step if the ASP main processor fails and the job is not restarted. (The FAILURE 
option on the CLASS or MAIN statement is PRINT or CANCEL.) 


If automatic restart is requested for a job, the job’s I/O is set up by JES3 for the 
first job step and not for the step being restarted. Canceling a job in a dependent 
network prevents successor jobs from executing if they are dependent upon 
successful completion of the canceled job. Any operator commands in the input 
stream of the job step being restarted are not executed. Restart of JES3-controlled 
jobs may be accompanied by message IAT2006 and/or IAT2575. For responses 
to these messages, see JES3 Messages. 


Note: A job that contains generation data sets referenced by a relative generation 
number of +1 or greater and with a disposition of OLD, SHR, or MOD is failed by 
the JES3 interpreter service unless the UNIT parameter is included on the DD 
statement. With UNIT specified, the JES3 interpreter service permits this data set 
to be allocated on a deferred basis. 


The COBOL RERUN clause may be used to provide the COBOL user with linkage 
to checkpoint/restart. Cautions and restrictions on the use of checkpoint/restart 
also apply to the use of the RERUN clause. For further information, see the 
appropriate COBOL reference manual. 


When performing a sort with the IBM Sort/Merge program, you can, by including 
the CKPT parameter in sort control statements, cause checkpoint entries to be 
written and an automatic checkpoint restart to be requested. The job control 
language can be used to request automatic or deferred step restarts or a deferred 
restart at a checkpoint. 
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The PL/I user can use automatic and deferred step restart and can also take 
checkpoints and get automatic and deferred checkpoint restarts. To cause a 
checkpoint entry to be written and request an automatic checkpoint restart, the 
user codes CALL PLICKPT. 


Each checkpoint entry in the checkpoint data set is identified by a 
system-generated checkid. A system message (including the checkid) at the 
console notifies the operator that a checkpoint entry is written. 


The organization of the checkpoint data set is always physical sequential, and the 
data set may be written on magnetic tape or a direct access volume. Partitioned 
organization cannot be used. 


A DD statement must be present in the job stream to define the checkpoint data 
set. The DISP parameter in this DD statement is used to specify whether single or 
multiple checkpoint entries are to be written. DISP=(NEW,KEEP) specifies a 
single checkpoint entry; DISP=(MOD,KEEP) specifies multiple checkpoint entries. 


If CALL PLIREST is used in PL/I programs, the CKPTREST macro instruction 


must specify 4092 as an eligible user completion code. For more information, see 
““CKPTREST System Generation Specification” on page 3. 


Virtual Fetch 


( a. Modules managed by virtual fetch cannot issue a checkpoint restart. In addition, 
-. job steps that call virtual fetch cannot issue a checkpoint restart. 
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Appendix A. Checkpoint/Restart Codes 


Return Codes Associated with the CHKPT Macro Instruction 


Some reason codes may also appear in messages associated with abends 13F and 
23F. For more detailed information on the IHJxxxx reason codes, see 
Checkpoint/ Restart SVC Logic and System Messages. 


Return 
Code 
(Hex) 


00 


04 


08 


\ 7 
NS 2 


Meaning 


Successful completion. Code 00 is also returned if the RD parameter 
was coded as RD=NC or RD=RNC to totally suppress the function 
of CHKPT. 


Restart has occurred at the checkpoint taken by the CHKPT macro 
instruction during the original execution of the job. A request for 
another restart of the same checkpoint is normally in effect. Ifa 
deferred restart was performed and RD=NC, NR, or RNC was 
specified in the resubmitted deck, a request for another restart is not 
in effect. 


The following return codes have reason codes in register 0. 
Unsuccessful completion. A checkpoint entry was not written, and a 
restart from this checkpoint was not requested. A request for an 


automatic restart from a previous checkpoint remains in effect. 


One of the following conditions exists. 


Return 

Code Meaning 

001 Bad parameter list from caller 

002 Missing DD card for checkpoint data set 

003 Insufficient storage for a checkpoint or restart 
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005 


006 


007 


008 


009 


010 


011 


012 


013 


014 


015 


016 


019 


021 


027 


029 


032 


041 


043 


044 


045 


046 


047 


048 


051 


054 
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Nonzero key length for a checkpoint data set 
Incorrect record format for a checkpoint data set \ | 


eee 
Wrong DSORG for checkpoint data set 

Active timer queue element 

RB on chain of unacceptable type 

Open graphics data set on DEB chain 

Current task is a subtask 

Multitask environment 

Outstanding PCLINK 

Outstanding WTOR 

Invalid CHECKID length or format 

Checkpoint data set is not tape or disk 

Users DCB for checkpoint data set was open for input 

CHKPT data set must have standard label pen 

Checkpoint entry group does not fit on one volume Xe , 
ANSI XLATE on CHKPT data set not allowed 

ISAM data set open with DISP=SHR 


VSAM—LCreate mode and no repositioning 


VSAM—repositioning requested for ESDS with 
immediate upgrade 


VSAM—direct processing of an RRDS in create mode 
VSAM—GSR option in use 

VSAM—CBUF processing is in use or was used 
VSAM—CBIC option in use 

VSAM—media manager in use 

Active SPIE 


Primary address ID and/or secondary address ID is not 
home address ID 





057 Checkpoint DD statement is concatenated 


( a 061 Secondary addressing mode 
069 Too much VSMLIST data—user’s storage is fragmented 
beyond checkpoint’s capabilities 
075 STOW encountered a full directory on the checkpoint 
data set 
086 An abend occurred during a checkpoint or restart 
102 ISO/ANSI data set open at checkpoint is not supported 
206 VTAM data set is open 
208 Checkpoint data set is not empty 
209 Concurrent open on checkpoint data set 
210 DISP=SHR for checkpoint data set 
211 Checkpoint data set is not secure 
213 Checkpoint data set is a subsystem DS 
( a 214 New checkpoint data set on shared DASDI 
: 224 SAM-SI 
250 IMAGELIB DCB open 
0C System-related or I/O-related problem: 


e Anerror occurred during the handling of a system request, such as 
ESTAE, SETLOCK, or PURGE; or 


¢ AnJ/O error occurred in processing the CHKPT request; or 


e A VSAM error was detected while preparing a VSAM data set for 


the CHKPT request. 

004 Open failure on checkpoint data set 

020 I/O error during open of CHKPT data set 

022 Error reading or writing a SWA block control 

023 I/O error during write on CHKPT data set 

026 I/O error during STOW request 
f = 030 I/O error on QSAM or BSAM data set with EROPT not 
a equal accept 
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042 


VSAM—Repositioning error 


055 ~VSMREGN failed 
056 VSMLIST failed 
059 An unexpected return was received by IHJGLU0O 
186 VSAM—close error; error code included 
195 VSAM—no storage available 
200 Purge failed 
202 Setlock failed 
204 WIJOURN failed—this checkpoint is unavailable for 
restart 
205 WIJOURN failed—all checkpoints are unavailable for 
restart 
207 I/O error using subsystem interface 
240 ESTAE failed 
241 VSAM— indeterminate error 
242 VSAM—amachine check 
10 Successful completion with possible error condition. The task has 


control, by means of an explicit or implied use of the ENQ macro 
instruction, of a serially reusable resource; if the task terminates 


abnormally, it will not have control of the resource when the job step 


is restarted. Your program must, therefore, restore the enqueues. 


Additional information regarding explicit and implicit use of the ENQ 
macro instruction will be found under ‘Requesting Serially Reusable 


Resources” on page 26. 


000 Possible enqueue error 

017 Insufficient storage to check enqueues; possible error 

018 GQSCAN found an abnormal condition 

028 I/O error during SYNCDEV due to a user data set 
14 Unsuccessful completion. Internal error detected. 

058 A bad parameter list was passed to IHJGLUO00 

090 Internal program error 

091 Error building a message 
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Error encountered restoring purged I/O. The checkpoint was 
successful, but because of the error that was detected, restart may not 
be possible. In addition, the job now in progress may fail because of 
I/O errors. 


215 Unsuccessful attempt to restore user’s I/O during 
checkpoint. 


When one of the errors indicated by code 08, OC, 10, 14, or 18 occurs, the system 
prints an error message at the operator’s console. The message contains a code 
that further identifies the error. The operator should report the contents of the 
message to the programmer. 


Completion Codes Issued by Checkpoint /Restart 


Completion 
Code 
(Hex) 


13F 


Meaning 


System abend code 13F indicates that an error occurred during 
performance of a checkpoint restart. If aSYSABEND card is included 
in the job, a dump is produced, and the contents of the system control 
blocks, as shown in the dump, are unpredictable. 


One of the following conditions exists. 


Reason 

Code Meaning 

003 No storage 

024 More than five volumes but JFCB does not point to a 
JFCBX. 

031 I/O error during read from CHKPT data set 

033 Cannot reposition to a tape data set or record because the 
block count in the DCB is negative. The block count can 
be labeled NL or BLP, and it is open for RDBACK. 

034 Missing DD statement 

035 Wrong length record during read on CHKPT data set 

040 I/O error reading volume header record on a standard 
label tape during restart 

050 A volume serial number is not the same at restart as it 
was at checkpoint 

052 LPA or nucleus module has been deleted since checkpoint 
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053 
058 
059 


063 


074 


076 


077 
078 


079 


080 


081 


082 


086 


087 


088 
091 
092 


094 


096 


097 
098 


099 
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LPA or nucleus module has been moved since checkpoint 
A bad parameter list was passed to IHJGLU00 
An unexpected return was received by IHJGLUO0O 


I/O error while repositioning a tape to a record within a 
data set 


Error reading DSCB 


Extents in a data set’s DEB do not equal those in the data 
set’s DSCB 


ISAM open error 
Checkpoint was not on a 31-bit supervisor. Restart is. 


A non-BSAM or non-QSAM data set was changed to 
dummy at restart 


A compatibility interface data set was changed to dummy 
at restart 


Checkpoint task and restart task V=R mismatch 


TCAM is not active at restart, but a TCAM data set was 
open at checkpoint 


User error causing an abend 


Record order on checkpoint data set incorrect—detected 
by restart 


Storage not allocated as expected—detected by restart 
Error building a message 

An error occurred while processing a partitioned data set 
Nucleus routines or tables have moved or been deleted 
since checkpoint. LPA module entry points, which reside 


in the nucleus table, have been altered. 


An error occurred while deleting a member from a 
partitioned data set 


I/O error in nonstandard label routine during restart 
No UCB available during restart 


No DSAB found during restart 























100 


101 


103 


104 


181 


182 


183 


184 


185 


190 


191 


193 


194 


195 


196 


197 


198 


199 


201 


202 


203 


207 


216 


219 


220 





MSS error occurred during mount processing. See SSCR 
SSI. 


I/O error reading vol label on DASD 

A SSCR SSI checkpoint record was written, but the 
corresponding DCB is not in the open data set table at 
restart 

I/O error while repositioning a tape volume to a data set 
VSAM—during preformat 

VSAM—during verify 

VSAM—during put 

VSAM—during index put 

VSAM—open error; error code included 
VSAM—unable to get cluster information 


VSAM—unable to mount volumes 


VSAM—DS was in create mode during checkpoint, but 
not restart 


VSAM—IMM update was changed 
VSAM—no storage available 


VSAM—DS was modified between checkpoint and 
restart 


VSAM—CBUF for cross-region sharing 
VSAM—DS was extended to a new volume 
VSAM—BLDVRP error 

PGFIX failed 

Setlock failed 

Illegitimate call to restart 

I/O error using subsystem interface 

Region allocation at restart is incorrect 
Password error on a password-protected tape 


DSCB address has changed since checkpoint 
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221 Wrong password given for password-protected data set 





ee 
222 Tape header label has changed since checkpoint : ‘ 
240 ESTAE failed 
241 VSAM—indeterminate 
242 VSAM—machine check 
243 VSAM—bad SSCR 
23F System abend code 23F indicates a security violation was Astected 


during restart. 
One of the following conditions exists. 


251 A data set that was not a checkpoint data set at 
checkpoint time was found to be open to a secure 
checkpoint data set at restart. 


252 In a job using more than one checkpoint data set, one of 
the checkpoint data sets (not the one used for restart) 
was deemed not secure. 


255 Security violation—user not authorized to access a 
RACF-protected data set at restart oN 
2F3 System abend code 2F3 indicates that a job was executing normally Sa 
when system failure occurred. 
33F System abend code 33F indicates that a user data set, (such as a 3480 
subsystem), could not be synchronized. An I/O error occurred while 
writing data from a previous channel program to the media (deferred 
write error). 
eS 
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Appendix B. End-of-Volume Checkpoint Routine 


You can specify, in the related data control block exit list, the address of a routine 
that receives control when end-of-volume is reached in processing a physical 
sequential data set (BSAM or QSAM). (For information about forming an exit list 
and coding the EOV exit list for physical sequential data sets, see Data Facility 
Product: Customization.) The routine is entered after a new volume is mounted and 
all necessary label processing is completed. If the volume is a reel of magnetic 
tape, the tape is positioned after the tape mark that precedes the beginning of the 
data. The end-of-volume exit routine may take a checkpoint by issuing the 
CHKPT macro instruction. The job step can be restarted from this checkpoint. 
When the job step is restarted, the volume is mounted and positioned as upon entry 
to the routine. 


The end-of-volume exit routine returns control in the same manner as any other 
data control block exit routine. Restart becomes impossible if changes are made to 


the link pack area or nucleus after the checkpoint is taken. When the step is 


restarted, the virtual storage addresses of end-of-volume modules must be the same 
as when the checkpoint was taken. 
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List of Abbreviations 


ACB 
ANSI 
APF 
AVR 
BDAM 
BISAM 
BPAM 
BSAM 
DASD 
DCB 
DOS 
DSCB 
ESPIE 
EXCP 
FCB 
GDGNT 
I/O 
IPL 
ISAM 


ISO 


JFCB 


access method control block 

American National Standards Institute 
authorized program facility 

automatic volume recognition 

basic direct access method 

basic indexed sequential access method 
basic partitioned access method 

basic sequential access method 

direct access storage device(s) 

data control block 

disk operating system 

data set control block 

extended specify program interruption exits 
execute channel program 

forms control buffer 

generation data group name table 
input/output 

initial program load 

indexed sequential access method 


International Organization for 
Standardization 


job file control block 








JFCBE 
MCP 
MSS 
PDS 
QISAM 


QSAM 


SAM 
SCT 
SIOT 
SSCR 
SSI 
SWA 
TCAM 


TCB 


UCB 
ucsS 
VIO 
VSAM 


VTOC 


job file control block extension 
message control program 
mass storage system 


partitioned data set 


queued indexed sequential access method 


queued sequential access method 
rotational position sensing 
sequential access method 

special characters table 

step input/output table 
sybsytem checkpoint record 
subsystem interface 

scheduler work area 
telecommunications access method 
task control block 

track record 

unit control block 

universal character set 

virtual input/output 

virtual storage access method 


volume table of contents 
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abbreviations, list of 81 
abend codes 32,53 
ABEND macro 32 
abnormal termination 
accounting 66 
automatic restart 34, 37 
BSAM_ 55 
checkpoint entry 17 
completion of termination 66 
deferred checkpoint restart 42 
deferred step restart 40, 41 
job 1 
message sequence 31, 32 
passed dataset 45 
serially reusable resources 26, 74 
SYSABEND data set 55 
SYSOUT dataset 56 
SYSUDUMP data set 55 
update data set 63 
with CANCEL 21 
with CKPTREST 3, 21 
with ENQ 26 
work data set 60 
ACB 81 
access methods 
BDAM 27,52 
BISAM 27 
BPAM 
See BPAM 
BSAM 
See BSAM 
EXCP 8, 62 
ISAM 45, 53 
QISAM 
See QISAM 
QSAM 
See QSAM 
SAM 45 
VSAM 
See VSAM 
accounting routine 66 
acronyms, list of 81 
allocation, device 
direct access 56 
failure 64 
last volume 35 
restart 33, 36, 38 
UNIT parameter 63 
virtual storage 38 
allocation, space 
after checkpoint 45 
deallocate 41, 44, 45 
dynamic 29, 44 
job 3 


reallocate 35 
replace 45 
restart 57 
secondary 12,72 
shared DASD 11 
AMP parameter 58 
ANSI 81 
ANSI standard labels 12, 61 
APF 81 
APF (Authorized Program Facility) 
unauthorized user 5 
ATTACH macro 25 
Authorized Program Facility 
See APF 
automatic checkpoint restart 
abnormal termination 26 
accounting 66 
cancel 21, 26 
caution 36 
data set disposition 12, 28,37 
described 1,37 
during sort/merge 68 
during termination 12 
example 34 
JCL requirements 36 
JES3 68 
job deck 37 
job journal 3-4 
RD parameter 31, 42 
requesting 34 
resource variations 36 
step failure 34 
suppressed 57 
SWA 37 
SYSCHK DD statement 49 
SYSIN 56 
VIO data set 29 
with CHKPT macro 20 
automatic error options (EROPT) operand (DCB 
macro) 64 
automatic step restart 
abnormal termination 32 
accounting 66 
after checkpoint restart 36 
AVR (automatic volume recognition) 38 
data set disposition 37-38 
described 1 
during sort/merge 68 
eligible 3 
ENQ macro 26 
establishing checkpoints 7-8 
example 35 
EXTRACT macro 24 
ineligible 28 
initiated 38 
JCL requirements 36 
JES3 68 
JES3 restriction 57 
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job deck 37 
job journal 3 
messages 31 


MOD data set 

See MOD data set 
operator options 32 
PL/I 69 


RD parameter 2, 31, 34 
requesting 34 
resource variations 36 
restriction 41 
serially reusable resource 26 
SETPRT macro 24 
step failure 34 
suppressed 57 
SWA 37 
SYSABEND 55 
SYSUDUMP 55 
AVR 81 


basic direct access method 
See BDAM 
basic direct access method (BDAM) 27 
basic indexed sequential access method 
See BISAM 
basic indexed sequential access method (BISAM) 27 
basic partitioned access method 
See BPAM 
basic sequential access method 
See BSAM 
BDAM 81 
BDAM (basic direct access method) 27,52 
BISAM 81 
BISAM (basic indexed sequential access method) 27 
block count field 61 
BPAM 81 
BPAM (basic partitioned access method) 
checkpoint considerations 8 
DCB parameters 10 
SYSCKEOV DD statement 10 
UNIT parameter 12 
BSAM_ 81 
BSAM (basic sequential access method) 
checkpoint/restart data set 55 
DCB parameters 10 
end-of-volume 2,79 
multivolume 
See multivolume data set 
repositioning 
See repositioning 
UNIT parameter 12 
using 8 
VIO dataset 29 
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CALL PLICKPT, in PL/I programs 69 
CANCEL operand 
at automatic restart 37 
example 26 
instruction 21 
restriction 21 
use 26 
cataloged procedure 40, 44, 47, 48 
cataloging 
a generation data set 52 
chained scheduling 
DCB option 11 
channel programs 
DCB option 11 
CHECK macro 27,52 
checkid 
address operand in CHKPT macro 21 
defined 21-22 
duplicate 20 
how to specify 21-22 
IDADDR 22 
IDLNG 22 
in message at restart 32 
in partitioned dataset 16, 19, 22 
in sequential data set 19, 22 
length operand in CHKPT macro 22 
not specified 49 
primary identification 19 
programmer-specified 2,22 
S operand in CHKPT macro 22 
secondary identification 20 
system generated 2, 19-20 


unique 19 
checkpoint 
coding 7 


entry ontape 7, 16 
how to establish 8 
nonopened DCB passed to 16 
opened DCB passed to 16 
routine 11, 12, 56 
when to code 7 
when to take 54 
checkpoint at end-of-volume 8 
described 2 
‘return codesfrom 8 
suppressing 47 
use of 9 
checkpoint data set 
alternate 19 
closing the 16 
content 16 
DCB for 10 
DD statement 12, 36 
defined 2,49 
disposition 16,18 
DOS. 60 
end-of-volume 12 











generation data set used as_ 13 
I/Oerrors 64 
inPL/I 69 
labels for 21 
multivolume 

See multivolume data set 
opening 12-17 
partitioned 

See partitioned data set 
RACF-protected 4 
repositioning 

See repositioning 
resides on 21 
security 4-5 
sequential 16, 19, 22, 69 
space allocation 12 
storage estimates for 13 
SYSCHK DD statement 3 
termination 12 
use of 16-26, 49 
VIO data set 29 
with CHKPT 49 

See also CHKPT macro 


checkpoint entry 


at checkpoint restart 35 
described 2 
end-of-volume 17 
how written 16 
identified 19 
restrictions 27 
when left intact 21 
when written 21 
See also CHKPT macro 


checkpoint identification 


See checkid 


checkpoint routine 12 
checkpoint/restart 


components 1 

defined 1 

end-of-volume 8,9 

ensure restart 17 

inPL/I 69 

messages 12,17, 20 

restarts 31-49 

system generation requirements 3 


CHKPT macro 


described 2 
end-of-volume exit routine 2, 10 
how tocode 21 
information recorded 7 
instruction 1 
return codes associated with 71 
suppressing action of 47 
See also RD parameter 
use of 8-16, 34-49 
use of in exit routines 10 
use of with other macro instructions 24-25 


CHKPT macro, return codes 71 
CHKPT return codes 24 

CKPT parameter 68 
CKPTREST macro 3 
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closing checkpoint data set 16 
COBOL RERUN clause 68 
codes 
issued by checkpoint/restart 75 
returned in register 71-78 
to add or delete completion 3 
completion codes 
issued by checkpoint/restart 75 
to add or delete 3 
COND parameter 41, 46 
control blocks 


ACB 58 
DCB 

See DCB 
DSCB 45, 63 
JFCB 7 
JFCBE 25 
TCB 24 
UCB 61 


core storage 
See virtual storage 
cross memory 
restrictions on checkpoint 29 


ID 


DASD 81 
DASD (direct access storage devices) 
shared 12, 28 
UNIT parameter 63 
data control block 
See DCB 
data control block (DCB) 
See DCB 
data definition (DD) statement 
See DD statement 
dataset 61 
BDAM 27, 52 
BSAM 55 
checkpoint 
See checkpoint data set 
deallocated 45 
direct access 
See direct access 
disposition at automatic restart 37-38 
disposition at deferred checkpoint restart 40 
disposition at deferred step restart 40, 42 
dummy 45 
entry-sequenced 59 
extents 45 
generation 41,52 
ISAM 45, 53 
key-sequenced 59 
LINKLIB 36 
MOD 35 
MSS 54 
partitioned 54 
See also partitioned data set 
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passed 41 
password protected 4 
preallocated 57 
QISAM 62 
QSAM_ 62 
relative record 60 
repositioning 
See repositioning 
security 4,5 
SYSABEND 55 
SYSIN 56 
SYSOUT 56 
SYSUDUMP 55 
tape 8, 10, 21, 38, 41, 61, 62 
TCAM 57 
temporary 37,57 
user 7 
VIO $4, 29, 33, 68 
VSAM _ 8, 28, 38, 57, 58 
work 60 
data set control block (DSCB) 45, 63 
DCB 81 
NCP=2 11 
nonopened, passed to checkpoint 16 
OPTCD=C 11 
OPTCD=W 11 
optional parameters 11 
RECFM=UT 11 
DCB (data control block) 
data set security 5 
exit list 79 
for checkpoint data set 21 
in CHKPT 21 
parameters for checkpoint data set 10 
user requirement restriction 10 
with SETDEV 25 
DCB macro 
for checkpoint data set 10 
DD statement 
AMP parameter 58 
BSAM 9 
CHKPT parameter 8, 9, 11 
DATA type 9, 45, 56 
DCB parameter 10, 45 
DEFER parameter 12 
definition 7 
DEVD parameter 10 
DISP parameter 9, 12,35, 45, 49, 69 
disposition 12, 17, 33, 37-42, 69 
DUMMY 45 
end-of-volume 9 
examples 9, 12, 18-19, 36, 43, 49 
for automatic checkpoint 
restart 7, 17,34 
for checkpoint data set 9, 12,34 
for deferred checkpoint 
restart 19, 42, 46 
for deferred step restart 19, 40-42 
JOBLIB 49 
LABEL parameter 12 
OPTCD parameter 12 
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partitioned data set 54 
passed dataset 41 
pgtopd.LABEL parameter 12 
PL/I 69 
QNAME parameter 57 
RECFM parameter 11 
restrictions 10, 12 
RLSE parameter 12 
shared DASD 28 
SYSABEND 55 
SYSCHK 
See SYSCHK DD statement 
SYSCKEOV 10 
SYSIN 56 
SYSOUT parameter 55 
SYSUDUMP 55 
UNIT parameter 12, 38, 45, 63 
VOL parameter 28, 36, 41, 45 
VSAM _ 59 
with EXTEND 35 
with OUTINS 35 
DDNADDR parameter 22 
deferred checkpoint restart 
accounting 66 
checkpoint entries 16, 19 
data set disposition at 42-46 
described 2 
how system works at 46 
how to request 42 
JCL requirements 44 
job step time limit 67 
messages issued 39 
operator considerations 39 
PL/I 69 
resource variations allowed in 46 
RESTART parameter 
See RESTART parameter 
restrictions 46 
SYSCHK DD statement 3, 49 
SYSIN dataset 56 
VIO data set 29 
deferred step restart 1 
checkpoint entries 16, 19 
data set disposition at 40-42 
described 2 
generation data set 41,52 
how to request 40 
JCL requirements 41-42 
messages issued 39 
resource variations allowed in 42 
RESTART parameter 
See RESTART parameter 
restrictions 36 
use of CHKPT 21-22 
VIO data set 29 
delimiter(/*) statement 45 
DEQ macro 
at demount facility 28 
devices 
changing at restart 36 











deferred checkpoint restart 46 
deferred step restart 42 
for data sets at checkpoint/restart 63 
direct access 
BDAM 27 
checkpoint data set 10, 12, 21 
DCB parameters 10 
DEVD parameter 10 
device, use of 8, 36,56 
duplicate DSCB 38 
end-of-volume 2, 12 
ISAM data set 53 
MOD data set 
See MOD data set 
multivolume data set 
See. multivolume data set 
partitioned data set 
See partitioned data set 
PL/I 69 
processing 8,52 
serially reusable resources 26 
shared DASD 11, 28 


storage 28 
SYSIN 56 
SYSOUT 56 


UNIT parameter 12 
updating partitioned data set 54 
volume 8 
VSAM data set 60, 63 
directory, preserving contents of 54 
disk operating system (DOS) 
checkpoint records 60 
DISPLAY command 47 
disposition 
at automatic restart 37 
at deferred checkpoint restart 42 
at deferred step restart 41-42 
for PL/I 69 
of checkpoint dataset 12 
DOS 81 
DOS (disk operating system) 60 
DSCB 81 
DSCB (data set control block) 45, 63 
dummy dataset 29 
DUMMY parameter 45 
dynamic allocation 29 
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end-of-sequential retrieval (ESETL) macro 27 
end-of-volume 

checkpoint at 2, 8-9 

deferred checkpoint restart 45 

during checkpoint 17 

EOV routine 61 

exit routine 2,10 

See also exit routine 
message 12,17 
SYSCKEOV DD statement 10 





use of CHKPT 9,10, 12 
use of RD parameter 48 
end-of-volume checkpoint routine 79 
ENQ macro 26 
entry-sequenced data set 59 
EOV checkpoint routine 79 
EOV macro 79 
EROPT (automatic error options) operand (DCB 
macro) 64 
error 
input/output 64 
ESETL (end-of-sequential retrieval) macro 27 
ESPIE 81 
examples 
canceling automatic restart 26 
CHKPT=EOV parameter 9 
DD statements for checkpoint data set 12 
ensure restart 18-19 
getting TCB data after restart 24 
recording identification 20 
requesting automatic checkpoint restart 34 
requesting automatic step restart 35 
requesting deferred checkpoint restart 43 
requesting deferred step restart 40 
requesting resource after restart 27 
SYSCKEOV DD statement 10 
EXCP 81 
EXCP access method 8, 62 
EXEC statement 
automatic checkpoint restart 33,36 
COND parameter 41 
deferred checkpoint restart 42, 44, 46 
deferred step restart 41 
examples 34, 40, 43 
operator considerations 33 
PGM parameter 41 
RD parameter 
See RD parameter 
SYSCHK DD statement 42, 49 
TIME parameter 67 
exit routine 
end-of-volume 
checkpoint in 10, 45 
described 2,79 
EOV function 8 
label routine 61 
RD parameter (see RD parameter) i 
register contents 79 
explicit request for ENQ 26 
extents 
DSCB 45 
for deferred checkpoint restart 45 
SYSIN dataset 56 
EXTRACT macro 24 
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FCB 81 
file-protect message 39 
forms control buffer 
See FCB 
forms control buffer (FCB) 24 


GDGNT 81 
GDGNT (generation data group name table) 7,52 
generation data group name table 
See GDGNT 
generation data set 53 
in checkpoint 52 
inrestart 41 
JES3 restriction 68 
generation data set used as checkpoint data set 13 
generation, system 3 
global shared resources (GSR) 29 
GSR (global shared resources) 29 
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HOLD reply 33 
at automatic restart 37 
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1/O 81 
IDADDR parameter 23 
IDLNG parameter 23 
implicit request for ENQ 27 
indexed sequential access method 
See ISAM 
initial program load 
See IPL 
initiators 33 
input work queues 46 
input/output errors 64 
IPL 32,39, 81 
ISAM 81 
ISAM (indexed sequential access method) 
dataset 53 
interface 45 
ISO 81 
ISO/ANSI tape labels 
See ANSI tape labels 
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JCL (job control language) 
automatic restart 36 


deferred checkpoint restart 36, 44 


deferred step restart 41 


JES2 4 
JES3 


checkpoint/restart 68 
for generation data sets 


job journaling 4 
JFCB 81 


57, 68 


JFCB (job file control block) 7 


JFCBE 81 


JFCBE (job file control block extension) 7, 25 


job deck reinterpretation 37 


job entry subsystem 
See JES2, JES3 

job failure 
abend message 32 
automatic restart afte 


r 


34, 47 


during checkpoint entry 17 


possible cause 38 


saving checkpoint data 
termination during 67 


use of MOD 10 
use of RD parameter 


See RD parameter 


with JES3 4, 68 


10 


with job journaling 4, 68 
job file control block (JFCB) 7 


job file control block extension (JFCBE) 7 


See also JFCBE 
job journal 
ABEND 31 


automatic checkpoint restart 4 
automatic step restart 4 


system restart 4, 67 
JOB statement 


automatic checkpoint restart 34 


automatic step restart 34 


checkid 22 


deferred checkpoint restart 42 
deferred step restart 40-42 


RD parameter 


See RD parameter 
RESTART parameter 
See RESTART parameter 


job step 


message at abnormal termination 31 
termination at system restart 67 


time limit 67 
JOBLIB DD statement 
journaling 


See job journal 
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labels 


for checkpoint data set 12, 21 


ISO/ANSI tape 61 
nonstandard tape 61 
restrictions 21 
standard 12,21, 61 
standard volume 61 
library 
See system library 
link library 37 
link pack area 
DCB exit 79 
list 79 
LINKLIB data set 36 


local shared resources (LSR) 29 
LSR (local shared resources) 29 


rm 


macros, checkpoint/restart 
ABEND 3,34 
ATTACH 25 
CHECK 27, 52 
CHKPT 
See CHKPT macro 
CKPTREST 3,34 
DCB 10 
ENQ 26, 27, 74 
EOV 79 
ESETL 27 
EXCP 8 
EXTRACT 24 
OPEN 12 
PCLINK 25 
READ 62 
RELEX 27 
RESERVE 28. 
SETDEV 25 
SETL 27 
SETPRT 24 
STIMER 25 
STOW 16,54 
WAIT 27, 52 
WRITE 10, 27, 53 
WTOR 25, 72 
magnetic tape 
access methods to process 
‘ANSI labels 61 
checkpoint entry 8,17 
devices 64 
DOS files 60 
MOD data set 
See MOD data set 
multivolume 
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See multivolume data set 





nonstandard labels 61 
processing 8 
repositioning 
See repositioning 
restriction 49 
standard labels 12, 21, 61 
use of 21 
volumes 12, 28, 41; 49 
7-track 10 
mass storage system 
See MSS 
Mass Storage System (MSS) 54 
MCP 81 


MCP (message control program) region, for TCAM 57 


member 
checkpoint entry 16, 19 
deleted 55 
DSNAME parameter 49 
memory 
See virtual storage 
message control program region for TCAM 57 
messages 
ABEND 31 
at restart 4,31 
authorization for restart 32 
checkpoint ID 20 
checkpoint not taken 12, 17 
data set security 32 
during automatic checkpoint restart 32 
during deferred checkpoint restart 39 
EOV 8 
error 71-78 
mount 31-39 
password 32-39 
replies 32-33 
secure volume 39 
virtual storage requirements 31,39 
when checkpoint successful 20 
MOD data set 
automatic step restart 37 
checkpoint entries 16 
during automatic restart 33 
during automatic step restart 35 
during deferred step restart 41 
example 17 
PL/I 69 
SYSCKEOV DD statement 10 
with EXTEND 35 
with OUTINX 35 
mounting VSAM volumes 38 
MSS 81 
MSS (Mass Storage System) 54 
multivolume data set 
BSAM 9 
CHKPT JCL parameter 9 
during deferred checkpoint restart 49 
end-of-volume 2,9 
QSAM 9 
SYSCHK DD statement 49 
with SWA 37 
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NOreply 33 
nonspecific type volume 28, 33,38 
nonstandard tape labels 61 
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OPEN macro 12 
open routine 62 
opening checkpoint data set 12-17 
operating system, special features 28 
operator considerations 
automatic restart 
message sequence 31 
options 32 
deferred restart 
considerations 39 
message sequence 39 


PAGE pack 33 
partitioned data set 
adding members 54 
checkid length in CHKPT macro 22 
checkpoint identification for 19 
considerations for 54 
deleting members 55 
directory 54 
members 
See member 
repositioning 
See repositioning 
restriction 49 
unsuccessful completion 76 
updating 54 
updating members 54 
with SYSCHK DD statement 49 
PC routine, restriction 30 
PCLINK, restriction 25, 30 


PDS 81 

PGM parameter 40-41, 46 
PL/I 69 

PLICKPT 69 


preallocated temporary data set 57 
printer 
repositioning at restart 62 


1403 24 
3203-5 24 
3211 24 
3262 24 
4245 24 
4248 24 
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processor time 66. 


programmer-specified checkpoint, identification 21-22 


programs 
APF 5 
ASP 68 
AVR 38 
COBOL 68 
DOS 60 
JES2 4 
JES3 4, 68 
PL/I 69 
RACF 4 
sort/merge 68 
TCAM 57 

QISAM 81 


QISAM (queued indexed sequential access method) 


dataset 62 
for checkpoint restart 8 
I/Oerrors 64 
ISAM data set 53 
repositioning 
See repositioning 

with ENQ 27 

QSAM 81 

QSAM (queued sequential access method) 
CHKPT JCL parameter 9 
dataset 62 
end-of-volume 2,79 
for checkpoint restart 8 
I/Oerrors 64 
multivolume 2,9 
repositioning 

See repositioning 

VIO data set 29 

queue, input work 37 

queued indexed sequential access method 
See QISAM 

queued sequential access method 
See QSAM 
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RACF 
user data set security 4 
RD parameter 
described 2 
for automatic restart 34 
how tocode 47-48 


suppressing action of CHKPT macro 2 


READ macro 62 
reader, restart 62 
record 











catalog 59 
update after checkpoint 63 
region 39 
REGION parameter 39 
register contents 
completion codes 71 
relative record data set 60 
RELEASE command 
at automatic restart 37 
initiate restart 33 
RELEX macro 
issued before CHKPT macro 27 
repositioning 
at checkpoint restart 32, 35 
at deferred step restart 39, 41 
BSAM data set 55 
card reader 62 
dataset 35, 62 
direct access data set 35, 62 
MOD data set 35, 41 
partitioned data set 54 
password protected 32, 39 
QISAM data set 62 
QSAM data set 62 
restriction 29 
routine 62 
shared resources 29 
SYSIN data set 56 
SYSOUT data set 56 
tape dataset 61 
user data set 62 
VSAM data set 57,58 
RERUN clause 68 
RESERVE macro 28 
resource variations 
automatic checkpoint restart 36 
automatic step restart 36 
deferred checkpoint restart 46 
deferred step restart 42 
resources, serially reusable 
See serially-reusable resources 
restart 
at end of volume 8 
automatic checkpoint 
See automatic checkpoint restart 
automatic step 
See automatic step restart 
deferred checkpoint 
See deferred checkpoint restart 
deferred step 
See deferred step restart 
how toensure 17 
inPL/I 69 
messages issued at 31,39 
of generation data sets 52 
of MOD dataset 35 
of SYSOUT data sets 56 
of VSAM data sets 58 
repositioning data sets at 62 
routine 61 
SYSIN dataset 56 
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restart definition parameter 
See RD parameter 
RESTART parameter 
described 3 
for deferred checkpoint restart 42 
for deferred step restart 41 
how tocode 48 
restart reader 62 
restart, types of 1 
restrictions 
cross memory 29 
deferred checkpoint restart 46 
MVS/System Product Release 3 69 
standard volume label 61 
when to issue RELEX macro 27 
when to issue WRITE macro 27 
with control of sources 27 
return codes 
issued by checkpoint/restart 71-78 
to add ordelete 3 
return codes, CHKPT macro 71 
RLSE parameter 12 
routine 
accounting 66 
checkpoint 7, 11, 12, 62 
end-of-volume exit 
See exit routine 
for nonstandard tape labels 61 
repositioning 62 


RPS 81 

RPS (rotational position sensing) 
device 64 

RO record 


with ENQ macro 27 


[s| 


SAM 81 

scheduler work area control block (SWA) 3,37 

SCT 81 

security 
checkpoint data set 5 
user data set 4 

serially reusable 
resources 

ENQ macro 26-28 

SETDEV macro 25 

SETL macro 27 

SETPRT macro 24 

shared DASD 28. 

shared resources 29 

SIOT 81 

sort/merge program 68 

special operating system features 
DEQ demount 28 
dynamic allocation 29 
shared DASD 28 
shared resources 29 
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VIO data set 29 
SPOOL pack 33 
SRTEDMCT field 61 
SSCR 81 
SSI 81 
standard volume label 61 
step time limit 67 
STIMER macro 25 
storage 
See virtual storage 
storage estimates 
described 13 
dynamic storage 16 
fragmented storage 16 
STOW macro 16, 54 
SWA 81 
SWA (scheduler work area) 3, 37 
SYSABEND data set 55 
SYSCHK DD statement 
deferred checkpoint restart 42 
described 3 
example 49 
how tocode 49 
restrictions 49 
SYSCHKEOV DD statement 10 
SYSCKEOV DD statement 
described 3 
sysgen 
See system generation 
SYSOUT data set 
at automatic restart 56 
at restart 56 
checkpoint positioning information 56 
with deferred checkpoint restart 56 
SYSOUT parameter 56 
SYSRES volume 33, 36 
system completion codes 
See completion codes 
system failure 
abend message 32 
automatic restart after 47 
during checkpoint entry 17 
possible cause 36 
saving checkpoint data 10 
termination during 67 
use of MOD 10 
use of RD parameter 
See RD parameter 
with JES3 4, 68 
with job journaling 4, 68 
system generation 
CKPTREST macro 3 
system library 
SYS1.IMAGELIB 25 
system operations 
at automatic restart 37 
at deferred checkpoint restart 42 
system restart 
job journal 4, 31, 67, 68 
SYSUDUMP dataset 55 
SYS1.IMAGELIB data set 25 
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table 
generation data group name 52 
terminal 57 
tape data set 
bypass ‘label processing 61 
checkpoint data set 10, 21 
DCB parameters 10 
labels 12,21 
MOD data set 
See MOD data set 
nonstandard labels 61 
unsuccessful completion 75 
with access methods 8 
with EXCP 62 
tape, magnetic 
See magnetic tape 
task control block 
See TCB 
TCAM 81 
TCAM (telecommunications access method) 57 
TCB 81 
TCB (task control block) 24 
TCBUSER field 66 
telecommunications access method 
See TCAM 
temporary data set 37 
terminal table for TCAM 57 
termination 
abnormal 
See abnormal termination 
accounting 66 
completed at system restart 4, 67 
generation data set 52 
messages issued 
See messages 
no restart 4 
records reprocessed 64 
with automatic restart 12 


See also automatic checkpoint restart, automatic 


step restart 
track overflow feature 
DCB option 11 


TTR 81 
UCB 81 


UCB (unit control block) 
SRTEDMCT field 61 

UCS 81 

UCS (universal character set) 8 
SETPRT macro 24 
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unit control block 
See UCB 
UNIT parameter 


considerations for checkpoint/restart 63 


universal character set 

See UCS 
UNLOAD command 33 
update in place 63 
updating a partitioned data set 54 
user data set 

security 

RACF 4 

user errors 65 
user repositioning routine 62 


validity checking 
DCB option 11 
VARY command 33 
VIO 81 
VIO dataset 29 
allocation 28, 33 
canceling 33 
with JES3 68 
with job journaling 4 
virtual storage 
at deferred checkpoint restart 46 
buffers 62 
contents written 1,7 
nonpageable 46 
pageable 3 
reasons unavailable 39, 46 
requirements 31, 33,57 
resource variations 46 
TCAM needs 57 
writing 11 
virtual storage access method 
See VSAM 
VOL parameter 
deferred checkpoint restart 45 
deferred step restart 41 
shared DASD 28 
volume 
at automatic restart 36 
at deferred checkpoint 46 
at deferred step 41 
changing at restart 33 
volume label 
standard 61 
volume mounting 
VSAM 38 
VSAM 81 
VSAM (virtual storage access method) 
checkpoint restart considerations 
cluster implicitly 28 
completion codes 57, 58 
dataset 57,58 
entry-sequenced data set 59 
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I/Oerrors 64 
immediate-upgrade data set 59 
key-sequenced data set 59 
mounting volumes for 38 
relative record data set 60 
restrictions 57 

VSAM data set 
at checkpoint 8 
ENQ 28 
I/Oerrors 64 
mounting 38 
repositioning 57,58 
types 59 

VTOC 81 


WAIT macro 27,52 

work dataset 60 

WRITE macro 53 

write validity checking 
checkpoint records 11 
DCB option 11 

WTOR macro 25,72 


YES reply 33 
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